Closed dmenig closed 2 years ago
Hello @hyperfraise , are you calibration in TRT, or are you using nvidia-pytorch-quantization tools? could you elaborate what's the failure you hit when try quantize 3d conv?
3d layers are in fact not supported by tensorrt in Int8 precision by design right now. I don't think there is much more to detail for this issue than asking when it will be available https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#layers-precision-matrix
Or are you saying you guys aren't actually developping TensorRT quantization tools, but instead nvidia-pytorch's ?
@hyperfraise oops, this doc seems out of date, we support int8 3d conv kernels in TRT7.x. Have you hit any issue? And I will ask for the documentation updates, thanks.
We owns/develops both quantization in TRT and nvidia-pytorch.
Ho ok. Well then my results are pretty weird.
Nvidia driver : 460.39 OS : Ubuntu 20.04 GPU : 2080 Ti
I' optimizing 3d and 2d resnets to show you this weird discrepancy :
import torch
import torchvision
## 2d code
dummy_input = torch.randn(8, 3, 224, 224).float().cuda()
model = torchvision.models.resnet101().cuda().eval()
## 3d code
# model = torchvision.models.video.r2plus1d_18().cuda().eval()
# dummy_input = torch.randn(8, 3, 35, 224, 224).float().cuda()
with torch.no_grad():
torch.onnx.export(
model,
dummy_input,
"resnet.onnx",
verbose=True,
)
Then I optimize those models with different versions of tensorrt and see the speedup. My commands are the following :
# FP32 optimization :
/usr/src/tensorrt/bin/trtexec --onnx=resnet.onnx --fp32 --workspace=5000 --saveEngine=resnet.trt --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw
# FP16 optimization :
/usr/src/tensorrt/bin/trtexec --onnx=resnet.onnx --fp16 --workspace=5000 --saveEngine=resnet.trt --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw
# INT8 (quantization) optimization :
/usr/src/tensorrt/bin/trtexec --onnx=resnet.onnx --best --workspace=5000 --saveEngine=resnet.trt --inputIOFormats=fp32:chw --allowGPUFallback --outputIOFormats=fp32:chw
And then I do a speed test in python. Here are my results (the numbers are spl/s at the size above) on 2080 Ti
On tensorrt 7.1.2 (docker image 20.06 on nvcr https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel_20-06.html#rel_20-06)
INT8 FP16 FP32
2d 3610 2050 640
3d 11.0 11.0 7.05
And here are the sizes of the .trt model saved in MB :
INT8 FP16 FP32
2d 87 86 295
3d 81 81 121
In addition, in the logs of the optimization, I see that some "i8" kinda confiigurations are tested in both cases, but never selected for the 3d models, as if they didn't bring any kind of speedup ?
It seems to me like tensorrt 7.x INT8 brings no speed improvement to 3d convolution on 2080 ti, which leads me to believe that either quantization doesn't happen or it doesn't actually bring a speedup ? Please tell me if I did something wrong.
Ok. I noticed a proper speedup on Tensorrt 7.2.2.3 (available with 21.03 container) on 2080 ti and Titan RTX but not on any other GPU. I tested 2070, 1080 Ti, 1660 Super : no speedup compared to FP16 with the same docker container. What do you think is happening ?
PS : I tested tvm quantization and noticed a speedup similar compared to FP16 between all those GPUs, so it seems weird that TensorRT wouldn't provide this speedup on all GPUs.
Hello @hyperfraise
Sorry typo in my previous comment, we add int8 3d conv support in TRT 7.2.x.
For 1080Ti and 1660 Super, there is no INT8 tensorcore, that explains why it is not speeded up.
Do you have data for the perf result on 2070 ? thanks.
I disagree : it can't be just simply tensor cores, since there is in fact a speedup of 2d models by going INT8 on the 1080 Ti, 1660S ! Can you please look into that ?
Hello @hyperfraise , We functional support INT8 in Pascal architecture, but the INT8 tensorCore support require Turing+ for dgpu products. In TRT we have only INT8 tensorCore kernels for 3dconv.
Then this is a feature request : could you please provide TRT INT8 kernel for regular cores for 3d conv ?
The fact is there is a speedup with 2d conv on all GPUs by going INT8, TensorCores or not, so, to my humble opinion, there should be roughly the same speedup for 3d conv. Could you guys please look into that ?
@hyperfraise I will create internal feature request to tracking this, thanks
Thank you
My tests on 2070 show in fact a speedup on this 3d architecture. So I retract this part. (I thought it wouldn't because it didn't with another 3d architecture. I'll double check on this and maybe open another issue.)
Sorry @hyperfraise given we have long back log of RFCs, the management see little value in supporting 3D conv acceleration in Pascal generation of GPUs. So we will not support INT8 3D conv in Pascal.
Thanks for the answer. But it doesn't seem to me that this is limited to Pascal GPUs. 1650 -> 1660 Ti are Turing GPUs, and as noted, show no speedup either. I believe the issue is that only Tensor cores show a speedup, while regular cores, which are not only present in Pascal but everywhere, don't show any speedup. So it is an ubiquitous issue when you think about it.
1660 GPUs do not have TensorCores so they won't give any speed-up for INT8.
closing this issue for now. Please feel free to reopen if you still have questions. Thanks
1660 GPUs do not have TensorCores so they won't give any speed-up for INT8.
closing this issue for now. Please feel free to reopen if you still have questions. Thanks
But 1660 does show a speedup for Conv2D. I'm pointing out that it'd be nice if it showed some speedup for Conv3d as well.
It's surprising to me that there is speed-up on 1660. If you can sure the trtexec logs with --verbose --profilingVerbosity=detailed --dumpLayerInfo --dumpProfile --separateProfileRun --noDataTransfers --useCudaGraph --useSpinWait
flags, I can take a look at why that's the case.
Hi. I'm not able to quantize 3d convolution layers. Is there any plan to add support for 3d layers to TensorRT quantization ?