Understanding the Underlying Implementation of model_calib

Hi,

I am trying to register a "custom layer" (not a native torch.nn layer, but a custom layer that is a super class of nn.Module) with modelopt and quantize it. I was doing minor patches to the modelopt torch quantization to be able to identify the places to insert quantizers. (For example, the layer uses layer.kernel to represents layer.weight).

However, when I export the model to onnx, i failed to build TRT engine due to

Error Code 10: Internal Error (Could not find any implementation for node /MatMul_%26_cpy.)

When I run the model with polygraphy python ~/trt_model_opt/bin/polygraphy run saved_model.onnx --onnxrt I got

MatmulInteger : b zero point is not valid.

I am guessing it is because the tensor quantizer is not properly calibrated. However, model_calib is unfortunately a .so file. I wonder if you can share the source code to the file or shed light on how calibration is performed, so I can modify the layer / how quantizer is inserted to get weights quantized.

Thanks!

NVIDIA / TensorRT-Model-Optimizer

Understanding the Underlying Implementation of model_calib #40