Custom training of segmentation model using jetson-inference

bhusalsantosh commented 4 years ago

I am having issue in converting a custom trained segmentation network using pytorch on resnet101 network to convert to ONNX format using the onnx_export.py utility. It does not accept the export_onnx=True in the models.segmentation.__dict__[arch](num_classes=num_classes, aux_loss=None, pretrained=False, export_onnx=True). Is it safe OK to remove this parameter? Also how to load the custom trained onnx file for model inference? @dusty-nv

dusty-nv commented 4 years ago

Hi @bhusalsantosh , try using the v0.3.0 branch of my torchvision fork here: https://github.com/dusty-nv/vision/commits/v0.3.0

It contains some patches that make the FCN_ResNet ONNX models able to be exported to TensorRT.

bhusalsantosh commented 4 years ago

@dusty-nv . Thanks for getting back to me. I did the following steps with Torch 1.1.0 and Python3.6 $ sudo pip3 uninstall torchvision $ python3 -c "import torchvision" # should make error if succesfully uninstalled $ git clone -bv0.3.0 https://github.com/dusty-nv/vision $ cd vision $ sudo python3 setup.py install as mentioned in (https://github.com/dusty-nv/jetson-inference/issues/370#issuecomment-514285463) I am getting the following error whenever calling any resnet segmentation model. `$torchvision.models.segmentation.fcn_resnet18(num_classes=21, pretrained=False, export_onnx=True) torchvision.models.segmentation.fcn_resnet18() Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.6/dist-packages/torchvision-0.3.0-py3.6-linux-aarch64.egg/torchvision/models/segmentation/segmentation.py", line 70, in fcn_resnet18 File "/usr/local/lib/python3.6/dist-packages/torchvision-0.3.0-py3.6-linux-aarch64.egg/torchvision/models/segmentation/segmentation.py", line 50, in _segm_resnet File "/usr/local/lib/python3.6/dist-packages/torchvision-0.3.0-py3.6-linux-aarch64.egg/torchvision/models/segmentation/fcn.py", line 29, in init File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 327, in init False, _pair(0), groups, bias, padding_mode) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 40, in init out_channels, in_channels // groups, *kernel_size)) TypeError: new() received an invalid combination of arguments - got (float, float, int, int), but expected one of:

(torch.device device)
(torch.Storage storage)
(Tensor other)
(tuple of ints size, torch.device device)
(object data, torch.device device)`

How do I get over this issue?

bhusalsantosh commented 4 years ago

The issue was due to the float value returned from line 50 in torchvision/models/segmentation/segmentation.py. I converted it into integer and reinstalled the torchvision 0.3.0 library and everything worked well until testing.

dusty-nv / pytorch-segmentation

Custom training of segmentation model using jetson-inference #2