NVIDIA-AI-IOT / torch2trt

An easy to use PyTorch to TensorRT converter
MIT License
4.59k stars 675 forks source link

Problems converting different networks. #200

Open simoneluetto opened 4 years ago

simoneluetto commented 4 years ago

I succesfully tried the repo on some network in the examples, however i encountered some issues trying to convert different networks. I use a jetson nano with Tensorrt 5.0.6 and torch 1.2.0 First i tried to convert Inceptionv3 from torchvision models, the network is not converted and the error is: During conversion: [TensorRT] ERROR: Unused Input: i Then during inference: AttributeError: 'NoneType' object has no attribute 'get_binding_index'

I also tried to convert an implementation of SSD-mobilenet-v2 that can be found at: "https://github.com/qfgaohao/pytorch-ssd" this time the conversion script runs without any problem however the resulting network gives completely wrong score prediction, what could be the reason?

For both the cases there was the unsupported operation torch.unsqueeze that is a simple addition of a singleton dimension, if i understood it correctly, for this reason i used the same converter of function view to implement it.

jaybdub commented 4 years ago

Hi simoneluetto,

Thanks for reaching out!

Yes, it is likely the unsupported operations causing problems.

Currently, it is possible that the conversion can produce false positive conversions, by adding constant layers. If all layers are supported this shouldn’t happen.

I would like to consider how to make these failed conversions explicit, but for now it may be best to implement a converter for unsqueeze, or use a different operation.

Please let me know if you have any questions.

Best, John

simoneluetto commented 4 years ago

But i should have solved adding the converter for operation unsqueeze, now during the conversion there are no unsupported operations detected, is probably that there are other unsupported operations not detected? Or what could be the cause of the errors?

jaybdub commented 4 years ago

It's possible that an unsupported operation was not detected. Currently, it's not perfect how this is done (primarily looks for functions under torch.xxx, torch.nn.xxx, torch.nn.functional.xxx.

For the SSD model in particular, there are likely operations related to anchor box parsing that are not supported. Typically though, object detection models contain a backbone CNN, which likely is supported.

I haven't investigated the model you sent in particular, but it may look something like

backbone_trt = torch2trt(model.backbone, [data])  # may vary, for illustrative purposes

model.backbone = backbone_trt

Are you able to determine if this method matches your use case?

Best, John

joefutrelle commented 4 years ago

I get the error that unsqueeze is unsupported when I try to convert an inception_v3 model, but when I convert resnet models it works, even though I'm using unsqueeze in my transform in both cases.

joefutrelle commented 4 years ago

Here's what I get when I try to convert inception_v3 to trt:

Warning: Encountered known unsupported method torch.unsqueeze
Warning: Encountered known unsupported method torch.unsqueeze
Warning: Encountered known unsupported method torch.unsqueeze
[TensorRT] ERROR: (Unnamed Layer* 22) [Convolution]: kernel weights has count 864 but 288 was expected
[TensorRT] ERROR: (Unnamed Layer* 22) [Convolution]: count of 864 weights in kernel, but kernel dimensions (3,3) with 1 input channels, 32 output channels and 1 groups were specified. Expected Weights count is 1 * 3*3 * 32 / 1 = 288
Traceback (most recent call last):
  File "convert_inception_trt.py", line 50, in <module>
    model_trt = torch2trt(model, [image], fp16_mode=True, max_batch_size=1)
  File "/usr/local/lib/python3.6/dist-packages/torch2trt/torch2trt.py", line 377, in torch2trt
    outputs = module(*inputs)
  File "/home/jfutrelle/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.6.0a0+40c99ea-py3.6-linux-aarch64.egg/torchvision/models/inception.py", line 192, in forward
    x, aux = self._forward(x)
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.6.0a0+40c99ea-py3.6-linux-aarch64.egg/torchvision/models/inception.py", line 129, in _forward
    x = self.Conv2d_1a_3x3(x)
  File "/home/jfutrelle/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torchvision-0.6.0a0+40c99ea-py3.6-linux-aarch64.egg/torchvision/models/inception.py", line 433, in forward
    x = self.bn(x)
  File "/home/jfutrelle/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch2trt/torch2trt.py", line 202, in wrapper
    converter['converter'](ctx)
  File "/usr/local/lib/python3.6/dist-packages/torch2trt/converters/BatchNorm2d.py", line 8, in convert_BatchNorm2d
    input_trt = trt_(ctx.network, input)
  File "/usr/local/lib/python3.6/dist-packages/torch2trt/torch2trt.py", line 116, in trt_
    num_dim = len(t._trt.shape) # non-leaf tensors must already have _trt, get shape from that
ValueError: __len__() should return >= 0

It seems to me that someone must have successfully converted inception_v3 to trt, but I can't find a code snippet.

joefutrelle commented 4 years ago

I figured out my issue.

Torch's inception_v3 has a constructor keyword transform_input that needs to be set to False. That removes input transformations that are not needed for my application, and which use the unsqueeze operation.

https://github.com/pytorch/vision/blob/0156d58ec867590b1c78fe1bc834c7da9afdf46a/torchvision/models/inception.py#L119

redzhepdx commented 3 years ago

You can create your own wrapper for the torchvision models or any model that contains a dictionary output(OrderedDict etc). Here is my solution for torchvision.detection models' backbones:

[Tested with retinanet_resnet50_fpn]

class BackboneWrapper(nn.Module):
    def __init__(self, model: nn.Module):
        super(BackboneWrapper, self).__init__()
        self.model = model

    def forward(self, x) -> List[torch.Tensor]:
        outputs = self.model(x)
        return [output for name, output in outputs.items()]

Usage :

backbone = BackboneWrapper(retina_fpn_model.backbone)
trt_backbone = torch2trt(backbone, [x])

Also to be able to re-create the model from the saved backbone you will need another wrapper to map these output tensors into a dictionary with corresponding tags(Probably there will an expectation in the main forward function of the model).

class BackboneDeWrapper(nn.Module):
    def __init__(self, model: nn.Module, return_names: List[int]):
        super(BackboneDeWrapper, self).__init__()
        self.model = model
        self.return_names = return_names

    def forward(self, x) -> OrderedDict:
        outputs = self.model(x)
        return OrderedDict({str(name): output for output, name in zip(outputs, self.return_names)})

Usage :

backbone = torch.load("saved_backbone")
returned_layers = [2, 3, 4, 6, 7] # P2, P3, P4, P6 and P7 for fpn
model.backbone = BackboneDeWrapper(backbone, return_names=returned_layers)
zhLawliet commented 3 years ago

import torch from torch2trt import torch2trt from torchvision.models.resnet import resnet50 model = resnet50(pretrained=False).eval()#.cuda()

create example data

x = torch.rand((1, 3, 224, 224))#.cuda()

convert to TensorRT feeding sample data as input

model_trt = torch2trt(model, [x]) y = model(x) y_trt = model_trt(x)

check the output against PyTorch

print(torch.max(torch.abs(y - y_trt)))

AttributeError: 'NoneType' object has no attribute 'get_binding_index'

企业微信截图_722210c7-c8ab-4eeb-a411-43d10bcf5264