NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
https://nvidia.github.io/TensorRT-Model-Optimizer
Other
575 stars 43 forks source link

In cache_diffusion example, can we use dynamic image shape & batch size? #101

Open wxsms opened 2 weeks ago

wxsms commented 2 weeks ago

Looks like it's not working at this time. What change shall be made if we have to do so? Thanks!

jingyu-ml commented 2 weeks ago

Our priorities shifted, so we haven't tested it recently. I can have the bugs fixed by the end of the week. Thanks! Just to confirm, is there a shape mismatch in the engines?

wxsms commented 2 weeks ago

Thank you, It works fine with static shapes for now.