NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, sparsity, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
https://nvidia.github.io/TensorRT-Model-Optimizer
Other
438 stars 27 forks source link

Does Model Optimizer work Windows? #68

Open adaber opened 1 week ago

adaber commented 1 week ago

Hi,

The main GitHub page lists only Linux but has anyone tested Model Optimizer on Windows ?

Thanks!

riyadshairi979 commented 1 week ago

Unofficially we used features from modelopt.onnx.quantization on windows, the pip installation instruction is same. Some of the modelopt.torch features might work as well.

adaber commented 1 week ago

Thanks for the quick response, @riyadshairi979 !

My plan is to use modelopt.torch.quantization initially (int8 quantization). I guess I should try and see if it works.

Do you guys plan on Model Optimizer supporting Windows at some point officially , too ?

Thanks!

riyadshairi979 commented 6 days ago

My plan is to use modelopt.torch.quantization initially (int8 quantization).

If your deployment runtime is TensorRT, then we recommend to use modelopt.onnx.quantization. Exported ONNX after INT8 PTQ with modelopt.torch.quantization is not optimal on TensorRT.

Windows at some point officially , too ?

Yes we are working on official support for windows.

adaber commented 4 days ago

Hi @riyadshairi979,

Firstly, thank you for your help!

If your deployment runtime is TensorRT, then we recommend to use modelopt.onnx.quantization. Exported ONNX after INT8 PTQ with modelopt.torch.quantization is not optimal on TensorRT.

Thanks for sharing this important information. It will definitely save me time because I know what sub-package I should focus on. I will give it a try and report the results. I may follow up with a question or two regarding this particular sub-package.

Windows at some point officially , too ?

Yes we are working on official support for windows.

It's great to hear that.

Thanks!

adaber commented 4 days ago

I guess I already have 2 questions.

1) I assume that Model Optimizer does the calculations on a GPU but I couldn't find an option that allows me to pick what GPU to use in case there is more than one GPU in the system. Is there one ? (I might've missed it, though).

2) Does modelopt.onnx.quantization work with dynamic input shapes ? I did a quick test and "Tensor shape doesn't match for input" popped up. I can see that the ModelOpt's code compares the input_shape and calibration_data sizes and I assume that the input tensor's dynamic sizes might be what caused the assertion error.

Thanks!

riyadshairi979 commented 4 days ago
  1. By default modelopt uses cpu for calibration, we will provide a way for users to choose the ExecutionProviders like gpu etc.
  2. We will add the dynamic shape support in next release.
adaber commented 3 days ago

@riyadshairi979 Thanks for the quick response. It's very appreciated

Do you happen to know what the approximate time frames are for adding the dynamic shape and GPU support ?

Thanks!