Conditions / example of DepSepConvolution fusion

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

https://developer.nvidia.com/tensorrt

Apache License 2.0

10.71k stars 2.12k forks source link

Conditions / example of DepSepConvolution fusion #3237

Open vadimkantorov opened 1 year ago

vadimkantorov commented 1 year ago

https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#fusion-types says:

Depthwise Separable Convolution A depthwise convolution with activation followed by a convolution with activationmay sometimesbe fused into a single optimized DepSepConvolution layer. The precision of both convolutions must be INT8 and the device's computes capability must be 7.2 or later.

Are there any other conditions? What types of activations are admissible?

Is there example of fusable graphs? (this is important especially given that convs must already be int8)

There is almost no example or mentions of DepSepConvolution/TRT in Google Search.

Wonder about constraints of Q-DQ and qparams.

Thank you :)

zerollzeng commented 1 year ago

@nvpohanh ^ ^

nvpohanh commented 1 year ago

You need: a pair of Q/DQ before depthwise Conv, a pair of Q/DQ before the 1x1 Conv, and a pair of Q/DQ before the next Conv (after 1x1 Conv's activation).

Here is an illustration: 2023-08-23 12_11_45-C__Users_phuan_AppData_Local_Temp_MicrosoftEdgeDownloads_9490e67b-d4dc-45b7-9364

vadimkantorov commented 1 year ago

Thank you! We will try this pattern!

It would be awesome to have this fusion example as an .onnx file and maybe a .svg output from trex (to have a feel how it looks like after fusion).

ttyio commented 1 year ago

closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks all!

vadimkantorov commented 1 year ago

The thing is I cannot reopen if it was the third party (you) who closed the question :) but yeah, I will add a comment when we have some feedbacks

nvpohanh commented 1 year ago

reopen for now. thanks

aboubezari commented 9 months ago

Hey @nvpohanh, I tried the above graph in a small example as attatched below. I got the following error: [01/15/2024-15:26:23] [E] Error[10]: Could not find any implementation for node StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars/ReadVariableOp:0 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars_QuantizeLinear__18 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/Conv3D. [01/15/2024-15:26:23] [E] Error[10]: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars/ReadVariableOp:0 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars_QuantizeLinear__18 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/Conv3D.) [01/15/2024-15:26:23] [E] Engine could not be created from network [01/15/2024-15:26:23] [E] Building engine failed I'm using TensorRT version 8.6 and onnx opset 17.

nvpohanh commented 9 months ago

@aboubezari Could you provide the ONNX file so that we can repro and debug this issue? Thanks

aboubezari commented 9 months ago

Yes, I've attached the ONNX file as a zip file with just the onnx model in it. Let me know if you would like me to export different shapes or activations on the Convs. I have already tried using Relu activations instead of BatchNorm with no luck. aboubezari_debug.zip

nvpohanh commented 9 months ago

Filed internal tracker 4454538. Will let you know if we have any findings.

aboubezari commented 9 months ago

Awesome, thanks.

nzmora-nvidia commented 9 months ago

@aboubezari unrelated to the problem you've reported, I recommend placing the first BatchNorm after the first convolution (as it appears in the diagram above). The ONNX file in aboubezari_debug.zip looks like so:

aboubezari commented 9 months ago

@nzmora-nvidia I realized that I exported the model after tweaking it a bit to figure out the issue, my bad. Let me know if you need me to export you a new model.

nzmora-nvidia commented 9 months ago

@aboubezari Thank you, we can recreate the error and do not need the new model.

vadimkantorov commented 9 months ago

The ONNX file in aboubezari_debug.zip looks like so:

I guess it would be awesome to have such example ONNX files (or even complete PyTorch + torch-tensorrt) examples in the docs of TRT, especially when fusion is discussed (and given that fusion patterns are often fragile, especially together with quantization)!

nzmora-nvidia commented 9 months ago

@vadimkantorov That's a fair request. I'll provide some pytorch examples in the next TREx release.

nvpohanh commented 6 months ago

This issue has been fixed in TRT 10.0.0 EA. https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html#rel-10-0-0-EA

Thanks for reporting this issue.

vadimkantorov commented 6 months ago

@nvpohanh Please add somewhere in the docs an example *.onnx file or PyTorch example of properly getting DepSepConv to be used in TRT :) This is a very important module for speed-ups, it's important for users to know how export recognizable patterns for it...

E.g. a complete example of export of MobileNetV3 (making use of DepSep) https://pytorch.org/vision/stable/models/generated/torchvision.models.quantization.mobilenet_v3_large.html#mobilenet-v3-large would be great

aboubezari commented 6 months ago

Thank you @nvpohanh! Look forward to trying it out.

ttyio commented 6 months ago

I will close this since this is solved, thanks all!

vadimkantorov commented 6 months ago

@ttyio I think it's still important to provide in the docs ONNX files with examples of fusable graphs and ideally some complete examples of PyTorch code exporting these ONNX graphs