Correct prediction only for the second sample in batch

Goal I want to speed up the segmentation model from segmentation_models.pytorch library. I would like to use a model with a fixed value of the batch size > 1

Describe the bug I use torch2trt function with use_onnx=True. It successfully converts the model, but during the inference, trt model predicts the same as PyTorch model only for the second sample in batch (in case of batch_size > 1, with batch_size=1 it works well)

System information

OS Platform and Distribution: Ubuntu 18.04.5 LTS
ONNX Runtime version: 1.7.0
nvidia-tensorrt version: 7.2.2.3
conda version: 4.9.2
Python version: 3.7
PyTorch version: 1.7.0 (py3.7_cuda11.0.221_cudnn8.0.3_0)
GPU model and memory: 1080Ti 11Gb
PyCharm CE version: 2021.1.2

To Reproduce I have conda environment torch2trt.txt. I've noticed that something is wrong while I test it with some random inputs of different shapes (code: main.txt). Also, my guesses are confirmed by strange distributions of predictions (I uploaded screenshots of distributions to google drive).

Then I built the trt model into the existing pipeline and looked at the predictions on real data with different batch sizes. It became obvious that only one example from the batch is predicted by the trt model correctly (image predictions are in "image_prediciton" folder on google drive)

My attempts to fix Since I'm using onnx=True I replaced torch.onnx.export(module, inputs, f, input_names=input_names, output_names=output_names) with torch.onnx.export(module, inputs, f, input_names=input_names, output_names=output_names, dynamic_axes={'input_0': {0: 'batch_size'}, 'output_0': {0: 'batch_size'}}) in torch2trt function to enable arbitrary batch size but it caused an error:

[TensorRT] ERROR: Network has dynamic or shape inputs, but no optimization profile has been defined. [TensorRT] ERROR: Network validation failed.

Also, I tried onnx=False but then it failed to convert the model with the following error:

Traceback (most recent call last): File "/media/tower/nvme/nn-speedup-test/torch2trt/test_image.py", line 154, in mask_trt = doSegmentation(image, checkpoint_path, piece_size=piece_size, step=step, batch_size=batch_size, device=device, use_trt=True) File "/media/tower/nvme/nn-speedup-test/torch2trt/test_image.py", line 33, in doSegmentation masks = getMasks(pieces=pieces, bad_mask_pieces=bad_mask_pieces, checkpoint_path=checkpoint_path, batch_size=batch_size, device=device, use_trt=use_trt) File "/media/tower/nvme/nn-speedup-test/torch2trt/test_image.py", line 76, in getMasks model = torch2trt.torch2trt(model, [x], max_batch_size=batch_size, use_onnx=False, log_level=trt.Logger.ERROR) File "/home/tower/anaconda3/envs/torch2trt/lib/python3.7/site-packages/torch2trt-0.2.0-py3.7.egg/torch2trt/torch2trt.py", line 554, in torch2trt File "/home/tower/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, kwargs) File "/home/tower/anaconda3/envs/torch2trt/lib/python3.7/site-packages/segmentation_models_pytorch/base/model.py", line 15, in forward features = self.encoder(x) File "/home/tower/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, *kwargs) File "/home/tower/anaconda3/envs/torch2trt/lib/python3.7/site-packages/segmentation_models_pytorch/encoders/efficientnet.py", line 73, in forward x = module(x, drop_connect) File "/home/tower/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/home/tower/anaconda3/envs/torch2trt/lib/python3.7/site-packages/efficientnet_pytorch/model.py", line 78, in forward x = self._swish(self._bn1(self._depthwise_conv(x))) File "/home/tower/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/tower/anaconda3/envs/torch2trt/lib/python3.7/site-packages/efficientnet_pytorch/utils.py", line 144, in forward x = F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups) File "/home/tower/anaconda3/envs/torch2trt/lib/python3.7/site-packages/torch2trt-0.2.0-py3.7.egg/torch2trt/torch2trt.py", line 300, in wrapper File "/home/tower/anaconda3/envs/torch2trt/lib/python3.7/site-packages/torch2trt-0.2.0-py3.7.egg/torch2trt/converters/conv_functional.py", line 46, in convert_Conv_trt7_functional TypeError: (): incompatible function arguments. The following argument types are supported:

(arg0: tensorrt.tensorrt.IConvolutionLayer, arg1: tensorrt.tensorrt.Dims) -> None

Invoked with: <tensorrt.tensorrt.IConvolutionLayer object at 0x7fd4227fa570>, ([1, 1], [1, 1])

Process finished with exit code 1

If you need any additional information feel free to ask

NVIDIA-AI-IOT / torch2trt

Correct prediction only for the second sample in batch #582