NVIDIA TensorRT-Model-Optimizer issues

NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.

https://nvidia.github.io/TensorRT-Model-Optimizer

Other

575 stars 43 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Will onnx_ptq support smoothquant in future?

#58 tp-nan opened 3 months ago
1
How to reproduce the stable diffusion acceleration by tensorrt

#57 luchangli03 closed 3 months ago
1
f

#56 riyadshairi979 closed 3 months ago
0
Load model failed:Protobuf parsing failed.

#55 AlexMercer-feng closed 3 months ago
1
How to implement LSQ using pytorch_quant

#54 AnnaTrainingG opened 3 months ago
3
Question about quant summary: input_quantizer, output_quantizer, and weight_quantizer

#53 DefTruth closed 3 months ago
0
why ONNX quantization works better on VIT?

#52 AlexMercer-feng opened 3 months ago
11
Weird Bug when QAT training with HfArgumentParser

#51 ShadowTeamCN opened 3 months ago
10
Error in TensorRT Execution: Internal Error After Model Quantization - Could Not Find Implementation for Node

#50 levipereira closed 3 months ago
2
ValueError: Runtime TRT is not supported.

#49 hawl666 opened 3 months ago
1
TRT INT8 Compilation Have Different Outputs Comparing to Modelopt Fake Q/DQ nodes for Resnet18

#48 YixuanSeanZhou opened 3 months ago
8
amaxs_values KeyError

#47 liyandong001 opened 4 months ago
2
CNN model opt int8 best practice example

#46 korkland opened 4 months ago
4
NotImplementedError

#45 XA23i closed 4 months ago
1
How to reproduce the results of mlperf v4.0 llama2 70b sparse?

#44 DehuaTang opened 4 months ago
2
Adding Phi 3 Vision quantization support

#43 iibw closed 2 months ago
2
the supporting models in modelopt.torch.quantization

#42 XA23i closed 4 months ago
5
TRT failed to compile a quantize standard resnet18

#41 YixuanSeanZhou closed 3 months ago
5
Understanding the Underlying Implementation of model_calib

#40 YixuanSeanZhou opened 4 months ago
1
The existing demo does not adequately support stable ControlNet.

#39 ZYJESJ opened 4 months ago
2
Cat not build TRT Engine after onnx int8 quantize

#38 DefTruth opened 4 months ago
4
SDXL int8 Failure of TensorRT 10 when running txt2img_xl.py on GPU A100

#37 13301338176 closed 4 months ago
4
Sparsity fp8 Llama-3-8b on RTX4090 has no speed improvement against dense one.

#36 lishicheng1996 closed 4 months ago
0
[Question] Does the quantized model run in full precision or int8 precision?

#35 leeeizhang closed 4 months ago
2
undefined symbol in SD FP8 workflow

#34 chrjxj closed 3 months ago
6
How can I ignore some layers and prevent them from being quantized in AWQ quantization by configuring the config file?

#33 shaoyanguo opened 4 months ago
1
int8 diffuser smoothquant will not generate good images

#32 13301338176 closed 4 months ago
2
Onnx Quantization doesn't have quantize method

#31 yixzhou closed 3 months ago
1
Error when quantizing onnx model

#30 de1star opened 5 months ago
7
Update README.md

#29 Edwardf0t1 closed 5 months ago
0
How to choose different alpha for mtq.INT8_SMOOTHQUANT_CFG?

#28 siahuat0727 closed 5 months ago
4
Request for Documentation of custom quantization algorithm / external quantized weight for AWQ

#27 nuxlear opened 5 months ago
2
Python 3.12 - error on pip install

#26 klyack closed 5 months ago
1
Request for Documentation: awq_lite, awq_clip, and awq_full

#25 stoical07 closed 5 months ago
2
Error when quantize Phi-3 to fp8, AssertionError: <class 'transformers.pytorch_utils.Conv1D'> already registered!

#24 Ross-Fan opened 5 months ago
3
Cannot export model to the model_config

#23 ashwin-js opened 5 months ago
2
how to quantize onnx to fp8?

#22 yuvraj108c opened 5 months ago
7
[Feature Request]Support for Encoder decoder models .

#21 ashwin-js opened 5 months ago
1
[BUG] The fp8 kv cache quantization result is wrong

#20 DearPlanet closed 5 months ago
4
quant onnx failed when meeting Softmax

#19 tp-nan closed 5 months ago
4
Ffail to quant onnx(weight stored with fp16) to int8 because of overflow

#18 tp-nan closed 4 months ago
1
What's impact from large tp and pp?

#17 aiiAtelier opened 6 months ago
4
Error when Export TRT model from the Quantized ONNX

#16 chuong98 opened 6 months ago
11
Error when export ONNX for FP8

#15 chuong98 closed 6 months ago
1
Tried to apply PTQ to a basic CV CNN network and got slower model in the end?

#14 tmagcaya opened 6 months ago
13
Error Converting checkpoint after INT4AWQ quantization

#13 christian-ci closed 6 months ago
7
Error when building Docker:

#12 chuong98 closed 6 months ago
5
Are there significant differences in precision and performance when deploying smoothquant using the ONNX/TRT path and the TensorRT-LLM path?

#11 tp-nan closed 6 months ago
2
An error occurred when converting the ONNX model to a TensorRT engine.

#10 ymgwjk closed 5 months ago
2
Is Starcoder2 supported?

#9 wxsms closed 6 months ago
5

Previous Next