NVIDIA-AI-IOT / torch2trt

An easy to use PyTorch to TensorRT converter
MIT License
4.58k stars 675 forks source link

Correct prediction only for the second sample in batch #582

Closed snazau closed 2 years ago

snazau commented 3 years ago

Goal I want to speed up the segmentation model from segmentation_models.pytorch library. I would like to use a model with a fixed value of the batch size > 1

Describe the bug I use torch2trt function with use_onnx=True. It successfully converts the model, but during the inference, trt model predicts the same as PyTorch model only for the second sample in batch (in case of batch_size > 1, with batch_size=1 it works well)

System information

To Reproduce I have conda environment torch2trt.txt. I've noticed that something is wrong while I test it with some random inputs of different shapes (code: main.txt). Also, my guesses are confirmed by strange distributions of predictions (I uploaded screenshots of distributions to google drive).

Then I built the trt model into the existing pipeline and looked at the predictions on real data with different batch sizes. It became obvious that only one example from the batch is predicted by the trt model correctly (image predictions are in "image_prediciton" folder on google drive)

My attempts to fix Since I'm using onnx=True I replaced torch.onnx.export(module, inputs, f, input_names=input_names, output_names=output_names) with torch.onnx.export(module, inputs, f, input_names=input_names, output_names=output_names, dynamic_axes={'input_0': {0: 'batch_size'}, 'output_0': {0: 'batch_size'}}) in torch2trt function to enable arbitrary batch size but it caused an error:

[TensorRT] ERROR: Network has dynamic or shape inputs, but no optimization profile has been defined. [TensorRT] ERROR: Network validation failed.

Also, I tried onnx=False but then it failed to convert the model with the following error:

Traceback (most recent call last): File "/media/tower/nvme/nn-speedup-test/torch2trt/test_image.py", line 154, in mask_trt = doSegmentation(image, checkpoint_path, piece_size=piece_size, step=step, batch_size=batch_size, device=device, use_trt=True) File "/media/tower/nvme/nn-speedup-test/torch2trt/test_image.py", line 33, in doSegmentation masks = getMasks(pieces=pieces, bad_mask_pieces=bad_mask_pieces, checkpoint_path=checkpoint_path, batch_size=batch_size, device=device, use_trt=use_trt) File "/media/tower/nvme/nn-speedup-test/torch2trt/test_image.py", line 76, in getMasks model = torch2trt.torch2trt(model, [x], max_batch_size=batch_size, use_onnx=False, log_level=trt.Logger.ERROR) File "/home/tower/anaconda3/envs/torch2trt/lib/python3.7/site-packages/torch2trt-0.2.0-py3.7.egg/torch2trt/torch2trt.py", line 554, in torch2trt File "/home/tower/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, kwargs) File "/home/tower/anaconda3/envs/torch2trt/lib/python3.7/site-packages/segmentation_models_pytorch/base/model.py", line 15, in forward features = self.encoder(x) File "/home/tower/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, *kwargs) File "/home/tower/anaconda3/envs/torch2trt/lib/python3.7/site-packages/segmentation_models_pytorch/encoders/efficientnet.py", line 73, in forward x = module(x, drop_connect) File "/home/tower/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/home/tower/anaconda3/envs/torch2trt/lib/python3.7/site-packages/efficientnet_pytorch/model.py", line 78, in forward x = self._swish(self._bn1(self._depthwise_conv(x))) File "/home/tower/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/tower/anaconda3/envs/torch2trt/lib/python3.7/site-packages/efficientnet_pytorch/utils.py", line 144, in forward x = F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups) File "/home/tower/anaconda3/envs/torch2trt/lib/python3.7/site-packages/torch2trt-0.2.0-py3.7.egg/torch2trt/torch2trt.py", line 300, in wrapper File "/home/tower/anaconda3/envs/torch2trt/lib/python3.7/site-packages/torch2trt-0.2.0-py3.7.egg/torch2trt/converters/conv_functional.py", line 46, in convert_Conv_trt7_functional TypeError: (): incompatible function arguments. The following argument types are supported:

  1. (arg0: tensorrt.tensorrt.IConvolutionLayer, arg1: tensorrt.tensorrt.Dims) -> None

Invoked with: <tensorrt.tensorrt.IConvolutionLayer object at 0x7fd4227fa570>, ([1, 1], [1, 1])

Process finished with exit code 1

If you need any additional information feel free to ask

snazau commented 3 years ago

I solved the issue by changing the source code for my specific model. The problem for me was that for some reason library takes only a single sample as input: inputs = [tensor.clone()[0:1] for tensor in inputs] and then during inference there is additional dimension that is equal to batch size shape = (batch_size,) + tuple(self.engine.get_binding_shape(idx))

Here is the code that I'm using now trt_functions.txt As a result, I got inference about 2x faster than PyTorch.