Closed lix19937 closed 1 year ago
@ttyio ^ ^
@lix19937 , could you elaborate more on your issue? no need to insert Q/DQ before resize, we should run resize in INT8 automatically for patterns like resize -> Q/DQ -> conv
. Thanks!
@ttyio
quant.onnx | trex plan |
---|---|
It seems that the first resize
op not quant with int8, commutes with DQ and with Q.
I have similar problem here. It seems that the input tensor of resize layer (nn.upsample) will be automatically rescaled to fp16/fp32, which takes some time thus the network could be even slower than fp16. In my experiment deconv (ConvTranspose) meets the same problem. Deconv itself will operate in int8, while bn and relu are fp32.
@Monoclinic @ttyio If you move all scales of quant.onnx(https://github.com/NVIDIA/TensorRT/issues/2976#issuecomment-1550587913 , the onnx described above), mark as unquant.onnx
, then use
trtexec --best \
--profilingVerbosity=detailed \
--separateProfileRun \
--exportProfile=profile.json \
--exportLayerInfo=layerinfo.json \
--onnx=unquant.onnx
the first resize
op will run in int8.
@lix19937 Hello, thanks for your reply. May I ask how to remove the scale layer?
I've tried to export the onnx model without Q\DQ (just the original pytorch model) , by this way resize
will be operated in int8. However if you apply Q\DQ in pytorch and export to onnx, trt would dequantized the tensor to fp32 and then do resize
.
I tried 8.6GA with a toy resize + q/dq + conv model, the resize is running in INT8 precision. Not sure what's corner case you hit here for your 1st resize. Are you using 8.6GA? could you share the onnx file for debug? @lix19937 thank you!
@Monoclinic Hi, You can remove Q-DQ nodes of quant.onnx by onnx_graphsurgeon
or onnx apis, as well as save scales.
It just indicates that unquant onnx can be it can better fusion by TRT PTQ, like resize
will run in int8; but by QAT, resize
op run in fp32/fp16.
@ttyio My TRT version is v8410, the quant.onnx in quant.zip , you can unzip it, thanks
quant.zip
@lix19937 , we added INT8 resize kernel in 8.4, but the QAT fusion relu is not updated, could you upgrade to 8.6GA?
Hi, @ttyio Thanks, it works in v8510 in Orin X 6060,
@lix19937 Hello, sorry for disturbing. I also use a Jetson Orin, I wonder how did you install / upgrade your TRT? I tried to install TRT 8.6/8.5 on my Orin (my current version is 8.4.0.1), but the released version are based on x86_64 or ARM SBSA, which are not suitable for Jetson devices. Is it necessary to reinstall the whole Jetson pack with a higher version of TRT?
// 或者看中文更方便一点。 我也用的是orin,现在的TRT版本是8.4,早上尝试尝试装了8.6(x86)和8.5(arm SBSA),安装倒是没什么问题,demo也能跑,但是onnx转trt的时候会直接segmentation fault,原因搞不太清楚。 x86毕竟平台问题,可能就是用不了(但是诡异的是编译能过),arm SBSA我看到nv论坛里有人说不一定适配jetson的板子,实际装完了好像也确实不行。 我想问下您是怎么用的8.5,是机器重新刷了一次jetson pack还是用什么办法从8.4升级上去的。
@Monoclinic segmentation fault
maybe due to version compatibility issues of cuda-X
In Orin , TRT v8510 means TRT v860, I just change another orin devkit which has been installed drive os 6060( map to TRT v8510).
You can install nv-driveos-repo-sdk-linux-6.0.6.0-32441545_6.0.6.0_amd64.deb
as follow:
1. Clean previous installation;
2. Installing host components on p3710;
3. Flash DRIVE OS Linux;
4. Install CUDA/CUDNN/TENSORRT, nv-tensorrt-repo-ubuntu2004-cuda11.4-trt8.5.10.4-d6l-target-ga-20221229_1-1_arm64.deb
@lix19937 Thanks for your kind advices. I will have a try.
@lix19937
Hi, What this sentences means: "In Orin , TRT v8510 means TRT v860"? And, where you get this information? Thank You Very Much!
@ttyio Hi, is that right? "In Orin , TRT v8510 means TRT v860"? I try to use Ampere 4:2 sparse. But, I saw DL model metric large drop problem if I use tensorrt 8.5.3.1. In tensorrt 8.6.1.6, the problem disappeared.
@lix19937
Hi, What this sentences means: "In Orin , TRT v8510 means TRT v860"? And, where you get this information? Thank You Very Much!
You can ref NVIDIA-TensorRT-8.5.10-API-Reference-for-DRIVE-OS.pdf
, if you want to find correct version you can fine in NvInferVersion.h
. Btw, Drive os 6060 frequently update.
I try to use Ampere 4:2 sparse. But, I saw DL model metric large drop problem if I use tensorrt 8.5.3.1. In tensorrt 8.6.1.6, the problem disappeared.
You can compare the build tactics between v8531 and v8616 on your model, check the fusion state and sparse layer chose. @wenqibiao
@lix19937 many thanks!
Description
How to solve
resize(upsample)
op in int8 by QAT(tools/pytorch-quantization) ? Except useConvTranspose
Environment
TensorRT Version:8.4