Open DefTruth opened 4 months ago
after add --nodes_to_exclude "AveragePool" --op_types_to_exclude "AveragePool"
, the engine can build successfully.
also, CPU is OOM when the BS is large
another model will raise a new error after onnx quantize:
[07/12/2024-18:29:08] [I] Finished parsing network model. Parse time: 0.633489
[07/12/2024-18:29:09] [W] [TRT] Calibrator won't be used in explicit quantization mode. Please insert Quantize/Dequantize layers to indicate which tensors to quantize/dequantize.
[07/12/2024-18:29:10] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[07/12/2024-18:33:45] [E] Error[10]: Error Code: 10: Could not find any implementation for node /Concat_17slice.
[07/12/2024-18:33:45] [E] Error[10]: IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node /Concat_17slice.)
[07/12/2024-18:33:45] [E] Engine could not be created from network
[07/12/2024-18:33:45] [E] Building engine failed
[07/12/2024-18:33:45] [E] Failed to create engine from model or file.
[07/12/2024-18:33:45] [E] Engine set up failed
This can likely be a TRT bug. If there is a way to reproduce, we can help dig deeper into the issue
/root/anaconda3/envs/modelopt/lib/python3.10/site-packages/modelopt/onnx/quantization/int4.py:27: UserWarning: Using slower INT4 ONNX quantization using numpy. Install JAX (https://jax.readthedocs.io/en/latest/installation.html) for faster quantization: jax requires jaxlib to be installed. See https://github.com/google/jax#installation for installation instructions. warnings.warn( Loading extension modelopt_round_and_pack_ext...
INFO:root:Model encoder-vd-512-10-skip-mha.onnx with opset_version 17 is loaded. INFO:root:Quantization Mode: int8 INFO:root:Quantizable op types in the model: ['Add', 'AveragePool', 'Mul', 'Conv'] INFO:root:Building non-residual Add input map ... INFO:root:Searching for hard-coded patterns like MHA, LayerNorm, etc. to avoid quantization. INFO:root:Building KGEN/CASK targeted partitions ... INFO:root:Classifying the partition nodes ... INFO:root:Total number of nodes: 507 INFO:root:Skipped node count: 0 WARNING:root:Please consider to run pre-processing before quantization. Refer to example: https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/image_classification/cpu/ReadMe.md Collecting tensor data and making histogram ... 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 286/286 [03:23<00:00, 1.41it/s] Finding optimal threshold for each tensor using 'entropy' algorithm ... Number of tensors : 286 Number of histogram bins : 128 (The number may increase depends on the data it collects) Number of quantized bins : 128 WARNING:root:Please consider pre-processing before quantization. See https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/image_classification/cpu/ReadMe.md INFO:root:Deleting QDQ nodes from marked inputs to make certain operations fusible ... INFO:root:Quantized onnx model is saved as encoder-w8a8-int8.onnx INFO:root:Total number of quantized nodes: 166 INFO:root:Quantized node types: {'Add', 'AveragePool', 'Sigmoid', 'Reshape', 'Mul', 'Shape', 'Conv'}
modelopt 0.13.1 TRT 10.1.0