Closed dmenig closed 3 years ago
I see no speedup between FP16 and INT8 on grouped convolution model.
TensorRT Version: 7.2.2.3 NVIDIA GPU: 2080 Ti NVIDIA Driver Version: 460.37 CUDA Version: 11.2 CUDNN Version: 8.1.0 Operating System: Ubuntu 20.04 Python Version (if applicable): 3.8 PyTorch Version (if applicable): 1.8.1
I use this python3.8 code snippet to save onnx models. The resnext file can be found in comes from https://github.com/kenshohara/3D-ResNets-PyTorch/blob/master/models/resnext.py
import torch import torchvision from resnext import generate_model dummy_input = torch.randn(8, 3, 35, 224, 224).float().cuda() ## Regular 3d model model = torchvision.models.video.r2plus1d_18().eval().cuda() ## Resnext 3d model model = generate_model( model_depth=101, sample_size=224, sample_duration=35, num_classes=24, input_channels=3, ) with torch.no_grad(): torch.onnx.export( model, dummy_input, "resnet.onnx", verbose=True, )
Use the script above to generate a onnx model. Then optimize it with this command :
# FP16 optimization : /usr/src/tensorrt/bin/trtexec --onnx=resnet.onnx --fp16 --workspace=5000 --saveEngine=resnet.trt --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw # INT8 (quantization) optimization : /usr/src/tensorrt/bin/trtexec --onnx=resnet.onnx --best --workspace=5000 --saveEngine=resnet.trt --inputIOFormats=fp32:chw --allowGPUFallback --outputIOFormats=fp32:chw
I use tensorrt from nvidia's container 21.03 https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel_21-03.html#rel_21-03
Speedtest results are the following :
Speeds in spl/s "Regular" 3d model : FP16 : 60.3 INT8 : 115.6 Resnext 3d model : FP16 : 111.2 INT8 : 111.1
INT8 expected speedup should be about *2, am I right ?
I believe that this kind of convolution might still not be supported in TensorRT, according to https://forums.developer.nvidia.com/t/does-tensorrt-support-conv3d-with-tensor-core/113193/9 If this is what is causing the problem, could you open a feature request so that this is solved when you guys have the time, please ?
Closing due to inactivity. i realize this is not easily reproductible. i'll repost with more reproductible code.
Description
I see no speedup between FP16 and INT8 on grouped convolution model.
Environment
TensorRT Version: 7.2.2.3 NVIDIA GPU: 2080 Ti NVIDIA Driver Version: 460.37 CUDA Version: 11.2 CUDNN Version: 8.1.0 Operating System: Ubuntu 20.04 Python Version (if applicable): 3.8 PyTorch Version (if applicable): 1.8.1
Relevant Files
I use this python3.8 code snippet to save onnx models. The resnext file can be found in comes from https://github.com/kenshohara/3D-ResNets-PyTorch/blob/master/models/resnext.py
Steps To Reproduce
Use the script above to generate a onnx model. Then optimize it with this command :
I use tensorrt from nvidia's container 21.03 https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel_21-03.html#rel_21-03
Speedtest results are the following :
INT8 expected speedup should be about *2, am I right ?
I believe that this kind of convolution might still not be supported in TensorRT, according to https://forums.developer.nvidia.com/t/does-tensorrt-support-conv3d-with-tensor-core/113193/9 If this is what is causing the problem, could you open a feature request so that this is solved when you guys have the time, please ?