Open tonylek opened 2 weeks ago
@Tracin Could you please take a look? Thanks
@tonylek
For Starcoder2 model, please use ModelOpt
to do calibration.
python3 example/quantization/quantize.py --model_dir starcoder2 \
--dtype float16 \
--qformat int8_sq \
--output_dir starcoder2/trt_ckpt/sq/1-gpu
trtllm-build --checkpoint_dir starcoder2/trt_ckpt/sq/1-gpu \
--output_dir starcoder2/trt_engines/sq/1-gpu --builder_opt=4
I will update this usage in the doc.
Hi, thanks, I'm still getting this error:
[TensorRT-LLM][WARNING] The manually set model data type is torch.float16, but the data type of the HuggingFace model is torch.float32.
Initializing tokenizer from /model/starcoder2-3b
No quantization applied, export float16 model
Unknown model type Starcoder2ForCausalLM. Continue exporting...
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
current rank: 0, tp rank: 0, pp rank: 0
torch.distributed not initialized, assuming single world_size.
torch.distributed not initialized, assuming single world_size.
Cannot export model to the model_config. The modelopt-optimized model state_dict (including the quantization factors) is saved to salmon_output/modelopt_model.0.pth using torch.save for further inspection.
Detailed export error: 'unknown:Starcoder2ForCausalLM'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/model_config_export.py", line 364, in export_tensorrt_llm_checkpoint
for tensorrt_llm_config, weights in torch_to_tensorrt_llm_checkpoint(
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/model_config_export.py", line 312, in torch_to_tensorrt_llm_checkpoint
tensorrt_llm_config = convert_to_tensorrt_llm_config(model_config, tp_size_overwrite)
File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/tensorrt_llm_utils.py", line 84, in convert_to_tensorrt_llm_config
"architecture": MODEL_NAME_TO_HF_ARCH_MAP[decoder_type],
KeyError: 'unknown:Starcoder2ForCausalLM'
Traceback (most recent call last):
File "/workspace/tensorrt_llm/examples/quantization/quantize.py", line 90, in <module>
quantize_and_export(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/quantization/quantize_by_modelopt.py", line 340, in quantize_and_export
with open(f"{export_path}/config.json", "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'starcoder2_output/config.json'
when I run:
python3 tensorrt_llm/examples/quantization/quantize.py --model_dir /model/starcoder2-3b --dtype float16 --qformat int8_sq --output_dir starcoder2_output
@tonylek Can you try to upgrade Modelopt
?
Hi,
I'm having issue when trying to convert starcoder2-3b with smoothquant to trtllm. I'm running on a100-40gi.
This is my commad:
python tensorrt_llm/examples/gpt/convert_checkpoint.py --model_dir /model/starcoder2-3b --output_dir salmon_output --tp_size 1 --smoothquant 0.5
This is the error I'm recieving: