Open vadimkantorov opened 1 year ago
@nvpohanh ^ ^
You need: a pair of Q/DQ before depthwise Conv, a pair of Q/DQ before the 1x1 Conv, and a pair of Q/DQ before the next Conv (after 1x1 Conv's activation).
Here is an illustration:
Thank you! We will try this pattern!
It would be awesome to have this fusion example as an .onnx
file and maybe a .svg
output from trex (to have a feel how it looks like after fusion).
closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks all!
The thing is I cannot reopen if it was the third party (you) who closed the question :) but yeah, I will add a comment when we have some feedbacks
reopen for now. thanks
Hey @nvpohanh, I tried the above graph in a small example as attatched below. I got the following error:
[01/15/2024-15:26:23] [E] Error[10]: Could not find any implementation for node StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars/ReadVariableOp:0 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars_QuantizeLinear__18 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/Conv3D. [01/15/2024-15:26:23] [E] Error[10]: [optimizer.cpp::computeCosts::3869] Error Code 10: Internal Error (Could not find any implementation for node StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars/ReadVariableOp:0 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/LastValueQuant/FakeQuantWithMinMaxVars_QuantizeLinear__18 + StatefulPartitionedCall/model/conv_block3d/quant_conv3d_depthwise/Conv3D.) [01/15/2024-15:26:23] [E] Engine could not be created from network [01/15/2024-15:26:23] [E] Building engine failed
I'm using TensorRT version 8.6 and onnx opset 17.
@aboubezari Could you provide the ONNX file so that we can repro and debug this issue? Thanks
Yes, I've attached the ONNX file as a zip file with just the onnx model in it. Let me know if you would like me to export different shapes or activations on the Convs. I have already tried using Relu activations instead of BatchNorm with no luck. aboubezari_debug.zip
Filed internal tracker 4454538. Will let you know if we have any findings.
Awesome, thanks.
@aboubezari unrelated to the problem you've reported, I recommend placing the first BatchNorm after the first convolution (as it appears in the diagram above). The ONNX file in aboubezari_debug.zip looks like so:
@nzmora-nvidia I realized that I exported the model after tweaking it a bit to figure out the issue, my bad. Let me know if you need me to export you a new model.
@aboubezari Thank you, we can recreate the error and do not need the new model.
The ONNX file in aboubezari_debug.zip looks like so:
I guess it would be awesome to have such example ONNX files (or even complete PyTorch + torch-tensorrt) examples in the docs of TRT, especially when fusion is discussed (and given that fusion patterns are often fragile, especially together with quantization)!
@vadimkantorov That's a fair request. I'll provide some pytorch examples in the next TREx release.
This issue has been fixed in TRT 10.0.0 EA. https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html#rel-10-0-0-EA
Thanks for reporting this issue.
@nvpohanh Please add somewhere in the docs an example *.onnx
file or PyTorch example of properly getting DepSepConv to be used in TRT :) This is a very important module for speed-ups, it's important for users to know how export recognizable patterns for it...
E.g. a complete example of export of MobileNetV3 (making use of DepSep) https://pytorch.org/vision/stable/models/generated/torchvision.models.quantization.mobilenet_v3_large.html#mobilenet-v3-large would be great
Thank you @nvpohanh! Look forward to trying it out.
I will close this since this is solved, thanks all!
@ttyio I think it's still important to provide in the docs ONNX files with examples of fusable graphs and ideally some complete examples of PyTorch code exporting these ONNX graphs
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#fusion-types says:
Depthwise Separable Convolution
A depthwise convolution with activation followed by a convolution with activation
may sometimesbe fused into a single optimized DepSepConvolution layer. The precision of both convolutions must be INT8 and the device's computes capability must be 7.2 or later.
Are there any other conditions? What types of activations are admissible?
Is there example of fusable graphs? (this is important especially given that convs must already be int8)
There is almost no example or mentions of DepSepConvolution/TRT in Google Search.
Wonder about constraints of Q-DQ and qparams.
Thank you :)