TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Apache License 2.0
7.51k stars 816 forks source link

smoothquant on starcoder2 #1886

Open tonylek opened 2 weeks ago

tonylek commented 2 weeks ago


I'm having issue when trying to convert starcoder2-3b with smoothquant to trtllm. I'm running on a100-40gi.

This is my commad: python tensorrt_llm/examples/gpt/convert_checkpoint.py --model_dir /model/starcoder2-3b --output_dir salmon_output --tp_size 1 --smoothquant 0.5

This is the error I'm recieving:

Generating validation split: 100%|███████████████████████████████████| 4869/4869 [00:00<00:00, 572495.69 examples/s]
calibrating model: 100%|██████████████████████████████████████████████████████████| 512/512 [00:44<00:00, 11.49it/s]
Traceback (most recent call last):
  File "/workspace/tensorrt_llm/examples/gpt/convert_checkpoint.py", line 2022, in <module>
  File "/workspace/tensorrt_llm/examples/gpt/convert_checkpoint.py", line 1984, in convert_and_save
    weights = convert_hf_gpt_legacy(
  File "/workspace/tensorrt_llm/examples/gpt/convert_checkpoint.py", line 1049, in convert_hf_gpt_legacy
    qkv_out_dim = qkv_w.shape[0]
AttributeError: 'NoneType' object has no attribute 'shape'
QiJune commented 1 week ago

@Tracin Could you please take a look? Thanks

Tracin commented 1 week ago

@tonylek For Starcoder2 model, please use ModelOpt to do calibration.

python3 example/quantization/quantize.py --model_dir starcoder2 \
        --dtype float16 \
        --qformat int8_sq \
        --output_dir starcoder2/trt_ckpt/sq/1-gpu

trtllm-build --checkpoint_dir starcoder2/trt_ckpt/sq/1-gpu \
        --output_dir starcoder2/trt_engines/sq/1-gpu --builder_opt=4

I will update this usage in the doc.

tonylek commented 1 week ago

Hi, thanks, I'm still getting this error:

[TensorRT-LLM][WARNING] The manually set model data type is torch.float16, but the data type of the HuggingFace model is torch.float32.
Initializing tokenizer from /model/starcoder2-3b
No quantization applied, export float16 model
Unknown model type Starcoder2ForCausalLM. Continue exporting...
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
current rank: 0, tp rank: 0, pp rank: 0
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
Cannot export model to the model_config. The modelopt-optimized model state_dict (including the quantization factors) is saved to salmon_output/modelopt_model.0.pth using torch.save for further inspection.
Detailed export error: 'unknown:Starcoder2ForCausalLM'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/model_config_export.py", line 364, in export_tensorrt_llm_checkpoint
    for tensorrt_llm_config, weights in torch_to_tensorrt_llm_checkpoint(
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/model_config_export.py", line 312, in torch_to_tensorrt_llm_checkpoint
    tensorrt_llm_config = convert_to_tensorrt_llm_config(model_config, tp_size_overwrite)
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/tensorrt_llm_utils.py", line 84, in convert_to_tensorrt_llm_config
    "architecture": MODEL_NAME_TO_HF_ARCH_MAP[decoder_type],
KeyError: 'unknown:Starcoder2ForCausalLM'
Traceback (most recent call last):
  File "/workspace/tensorrt_llm/examples/quantization/quantize.py", line 90, in <module>
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/quantization/quantize_by_modelopt.py", line 340, in quantize_and_export
    with open(f"{export_path}/config.json", "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'starcoder2_output/config.json'

when I run:

python3 tensorrt_llm/examples/quantization/quantize.py --model_dir /model/starcoder2-3b         --dtype float16         --qformat int8_sq         --output_dir starcoder2_output
Tracin commented 1 week ago

@tonylek Can you try to upgrade Modelopt?