NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.56k stars 2.1k forks source link

Question: Disable Optimizations for TensorRT #4075

Open YixuanSeanZhou opened 1 month ago

YixuanSeanZhou commented 1 month ago

Question

Because there are so many optimizations that TRT performs, sometimes it is very hard to isolate the issue if we see regression in model accuracy. I know we have the builder_optimization_level flag, but it seems to only control which kernel is used when executing the model.

I wonder if there is more fine-grained control? For example, I want to prevent fusions, or prevent removing dead code.

To give more context: In my specific use case, I am interested in isolating whether resolving Q/DQ nodes can causes regression in model. What I am interested to achieve is to only enable Q/DQ resolution and disable all other optimizations. Is this achievable?

Thanks in advance

lix19937 commented 4 weeks ago

For example, I want to prevent fusions, or prevent removing dead code.

For your ref, try to use follow.

polygraphy run spec.onnx --trt   --best   --trt-outputs mark all 
YixuanSeanZhou commented 4 weeks ago

Hi @lix19937 thanks for your response!

So is this CLI option supposedly to skip all the optimizations TRT does?

Also, I think --best is not the correct option. I tried to use --int8, but i got this errer:

[E] 2: Assertion static_cast<size_t>(c) < mSet.size() failed. 
[E] 2: [cgraph.h::assertIsValidSubscript::161] Error Code 2: Internal Error (Assertion static_cast<size_t>(c) < mSet.size() failed. )

The corresponding onnx file was able to be built with TRT python API. I also have ran the polygraphy surgeon sanitize --fold-constants before building this onnx file

Thanks in advance!