NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, sparsity, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
https://nvidia.github.io/TensorRT-Model-Optimizer
Other
434 stars 27 forks source link

AssertError in model_calib when using histogram as the calibrator #63

Open YixuanSeanZhou opened 2 weeks ago

YixuanSeanZhou commented 2 weeks ago

Hi,

When running mtq.quantize with "calibrator": "historgam" in my config, i got the following assert error

  File "modelopt/torch/quantization/model_calib.py", line 220, in modelopt.torch.quantization.model_calib.max_calibrate
  File "modelopt/torch/quantization/model_calib.py", line 909, in modelopt.torch.quantization.model_calib.finish_stats_collection
AssertionError

Tracing a bit higher, the assert error comes from

  File "/trt_modelopt/modelopt/torch/quantization/model_quant.py", line 132, in quantize
     return calibrate(model, config["algorithm"], forward_loop=forward_loop)

Could you please take a look on what I did was wrong?

Also, could you please provide some examples where histogram is being used as the calibrator? It also has 3 ways to calculates amax, but I wonder how that is being specified.

Thanks in advance,

YixuanSeanZhou commented 2 weeks ago

For a repro, I believe this should error out on any quantization flow if you use a layer with config:

{'num_bits': 8, 'axis': None, 'learn_amax': False, "calibrator": "histogram"}
riyadshairi979 commented 2 weeks ago
YixuanSeanZhou commented 2 weeks ago

@riyadshairi979 Thank you so much for your response:

histogram is not supported as a calibration algorithm iiuc, @realAsma can you check if calibrator field options looks correct.

This is unexpected. Based on the modelopt documentation, there is histogram as the calibration option. The same calibration options seems to exists in the original pytorch-quantization tooling from TRT. My assumption is modelopt is a better tooling comparing to before, meaning more comprehensive. Did I miss something 🤔

How did you define the quantization config, share the code snippet please.

This is my config, the reason I want to use histogram is to avoid the "outliers" for the amax in the calib examples.

qconfig = deepcopy(mtq.INT8_DEFAULT_CFG)
qconfig["quant_cfg"]["heads_3.*.inputs_quantizer*"] = {'num_bits': 8, 'axis': None, 'learn_amax': False, "calibrator": "histogram"}

What is your modelopt version?

Upgraded to the latest, 0.15.1, but still having the same issue.

Thanks! Looking forward to your response!

realAsma commented 2 weeks ago

Hi @YixuanSeanZhou , Your usage is correct. There is a bug in ModelOpt when histogram calibrator is used.

Also, could you please provide some examples where histogram is being used as the calibrator? It also has 3 ways to calculates amax, but I wonder how that is being specified.

I am sorry, I do not have any examples currently using histogram calibrator.

For now, could you please modify you script to perform calibrator manually instead of via mtq.quantize? Here is an example code

import torch

from modelopt.torch.quantization.model_calib import enable_stats_collection, finish_stats_collection
from modelopt.torch.quantization.model_quant import apply_mode

model = torch.nn.Linear(1024, 2048).cuda()

# An example config
config = {"quant_cfg": {"*": {"calibrator": "histogram"}}, "algorithm": "max"}

def calibrate_loop(model):
    # A method which simply forwards data through the model
    return model(torch.randn(1, 1024).cuda())

# config is the same config that was passed previously to mtq.quantize
model = apply_mode(model, mode=[("quantize", config)])
enable_stats_collection(model)
calibrate_loop(model)
finish_stats_collection(model, method="mse")

# Manually move the model to cuda after calibration
model.cuda()

print(model)
# Get simulation quantized output
output = calibrate_loop(model)

# Do ONNX export now
...
YixuanSeanZhou commented 4 days ago

Thanks for your response @realAsma and sorry for my delayed response (i was away last week). This script looks helpful and promising.

I have two questions I want to follow up:

  1. The model_calib seems isn't open sourced. By any chance we can open source it or at least provide a documentation on the methods within that file?
  2. For the finish_stats_collection what are all the methods available? Will they be max, entropy, percentile and mse? If using percentile, how do we specify the which percentile I will be using?

Thanks so much!