NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
https://nvidia.github.io/TensorRT-Model-Optimizer
Other
581 stars 44 forks source link

Load model failed:Protobuf parsing failed. #55

Closed AlexMercer-feng closed 3 months ago

AlexMercer-feng commented 3 months ago

I have already exported Intern-VIT(6B, pretrained, https://huggingface.co/OpenGVLab/InternViT-6B-224px) onnx with torch.onnx.export, and obtained one .onnx and many other weight files like this: image which occupy 22GB altogether. The onnx model passed onnx.checker and it can be opened via netron too.

Then, this error occurred when I tried to quantize Intern-VIT onnx model, while the example onnx_ptq/ can be executed normally:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/modelopt/onnx/quantization/__main__.py", line 133, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/modelopt/onnx/quantization/__main__.py", line 115, in main
    quantize(
  File "/usr/local/lib/python3.10/dist-packages/modelopt/onnx/quantization/quantize.py", line 207, in quantize
    onnx_model = quantize_func(
  File "/usr/local/lib/python3.10/dist-packages/modelopt/onnx/quantization/int8.py", line 186, in quantize
    quantize_static(
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/quantization/quantize.py", line 505, in quantize_static
    calibrator = create_calibrator(
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/quantization/calibrate.py", line 1155, in create_calibrator
    calibrator.create_inference_session()
  File "/usr/local/lib/python3.10/dist-packages/modelopt/onnx/quantization/ort_patching.py", line 194, in _create_inference_session
    calibrator.infer_session = ort.InferenceSession(
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/usr/local/lib/python3.10/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 472, in _create_inference_session
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /tmp/ort.quant.ofeo9ljp/augmented_model.onnx failed:Protobuf parsing failed.

I use NGC container (pytorch 24.07) to build the docker image, and here is my onnx version: image

Also, the command I use is exactly the same as onnx_ptq/ example. Is this owing to large onnx model? How can I fix it?

AlexMercer-feng commented 3 months ago

Update: already solved it with adding --use_external_data_format