TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
I am trying to quantize and export to tensorrt engine a llama 3 finetuned model . But I am able to quantize the model but however I am unable to export to tensorrt format because I am unable to export the model config.
Output:
Cannot export model to the model_config. The modelopt-optimized model state_dict (including the quantization factors) is saved to /app/TensorRT-Model-Optimizer/llm_ptq/saved_models_Gaja-v1_dense_int4_awq_tp4_pp1/modelopt_model.0.pth using torch.save for further inspection.
Detailed export error: Weight shape is not divisible for block size for block quantization.
I am trying to quantize and export to tensorrt engine a llama 3 finetuned model . But I am able to quantize the model but however I am unable to export to tensorrt format because I am unable to export the model config.
scripts/huggingface_example.sh --type llama --model $HF_PATH --quant int4_awq --tp 4
Output: Cannot export model to the model_config. The modelopt-optimized model state_dict (including the quantization factors) is saved to /app/TensorRT-Model-Optimizer/llm_ptq/saved_models_Gaja-v1_dense_int4_awq_tp4_pp1/modelopt_model.0.pth using torch.save for further inspection. Detailed export error: Weight shape is not divisible for block size for block quantization.