TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
I am trying to register a "custom layer" (not a native torch.nn layer, but a custom layer that is a super class of nn.Module) with modelopt and quantize it. I was doing minor patches to the modelopt torch quantization to be able to identify the places to insert quantizers. (For example, the layer uses layer.kernel to represents layer.weight).
However, when I export the model to onnx, i failed to build TRT engine due to
Error Code 10: Internal Error (Could not find any implementation for node /MatMul_%26_cpy.)
When I run the model with polygraphy python ~/trt_model_opt/bin/polygraphy run saved_model.onnx --onnxrt I got
MatmulInteger : b zero point is not valid.
I am guessing it is because the tensor quantizer is not properly calibrated. However, model_calib is unfortunately a .so file. I wonder if you can share the source code to the file or shed light on how calibration is performed, so I can modify the layer / how quantizer is inserted to get weights quantized.
Hi,
I am trying to
register
a "custom layer" (not a native torch.nn layer, but a custom layer that is a super class ofnn.Module
) with modelopt and quantize it. I was doing minor patches to the modelopt torch quantization to be able to identify the places to insert quantizers. (For example, the layer useslayer.kernel
to representslayer.weight
).However, when I export the model to onnx, i failed to build TRT engine due to
When I run the model with polygraphy
python ~/trt_model_opt/bin/polygraphy run saved_model.onnx --onnxrt
I gotI am guessing it is because the tensor quantizer is not properly calibrated. However,
model_calib
is unfortunately a.so
file. I wonder if you can share the source code to the file or shed light on how calibration is performed, so I can modify the layer / how quantizer is inserted to get weights quantized.Thanks!