Closed tp-nan closed 5 months ago
Can you try --op_types_to_exclude=Softmax
in the command line? I suspect this is due to some sort of side effect of shared input quantization. Also can you provide the ONNX model and full stacktrace?
Hi,
Can you try --op_types_to_exclude=Softmax
not work yet
Also can you provide the ONNX model and full stacktrace?
trace
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/modelopt/onnx/quantization/__main__.py", line 138, in <module>
main()
File "/usr/local/lib/python3.10/dist-packages/modelopt/onnx/quantization/__main__.py", line 121, in main
quantize(
File "/usr/local/lib/python3.10/dist-packages/modelopt/onnx/quantization/quantize.py", line 447, in quantize
quantize_static(
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/quantization/quantize.py", line 539, in quantize_static
quantizer = QDQQuantizer(
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/quantization/qdq_quantizer.py", line 207, in __init__
self.quantization_params = self.calc_graph_quant_params()
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/quantization/qdq_quantizer.py", line 1156, in calc_graph_quant_params
self.adjust_tensor_ranges()
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/quantization/base_quantizer.py", line 504, in adjust_tensor_ranges
self.tensors_range[node.output[0]] = TensorData(lowest=np.float32(0.0), highest=np.float32(1.0))
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/quantization/calibrate.py", line 127, in __setitem__
raise RuntimeError(f"Only an existing tensor can be modified, {key!r} is not.")
File "/usr/local/lib/python3.10/dist-packages/onnxruntime/quantization/calibrate.py", line 127, in __setitem__
raise RuntimeError(f"Only an existing tensor can be modified, {key!r} is not.")
The fp16 onnx (SigLip for VILA1.5-3B) has emailed to you. FP32 onnx is too big. You can also get it from https://github.com/Efficient-Large-Model/VILA/blob/44a4cca98ac0f81b0891eb2341e9826b5553b6e8/demo_trt_llm/build_visual_engine.py#L95
The following script may also reproduce issue in #18
python -m modelopt.onnx.quantization --quantize_mode int8 --verbose --onnx_path visual_encoder.onnx
This may be also related to onnxruntime
Please upgrade to modelopt 0.13. This issue have been fixed.
awesome!
Hi, when quant VILA 1.5: fp32 onnx => int8 onnx by:
Failed:
RuntimeError: Only an existing tensor can be modified, '/vision_tower/vision_tower/vision_model/encoder/layers.0/self_attn/Softmax_output_0' is not.
backtrace from onnxruntime:
How can i fix this