dusty-nv / jetson-inference

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
https://developer.nvidia.com/embedded/twodaystoademo
MIT License
7.78k stars 2.98k forks source link

Custom model architecture #896

Closed RGring closed 1 year ago

RGring commented 3 years ago

Hi @dusty-nv, Thanks for the nice repository!

Is it possible to load any model architecture (trained in pytorch) as long as it is converted to the onxx-format? If not, could you give me some pointers, what must be adapted if I want to get it working?

Thanks in advance for your time!

dusty-nv commented 3 years ago

Hi @RGring , there is also pre/post-processing code that goes along with the models. You can find this code in imageNet.cpp, detectNet.cpp, segNet.cpp (found under the jetson-inference/c/ directory).

For example, with pre-processing there is mean pixel subtraction, standard deviation/normalization, and NCHW format conversion applied. A lot of times this is similar for PyTorch models - for example, here is pre-processing for ONNX ssd-mobilenet model trained with PyTorch:

https://github.com/dusty-nv/jetson-inference/blob/5718df1495859808d5d076efea597b7e641bcb96/c/detectNet.cpp#L729

TensorFlow and Caffe typically do their pre-processing slightly differently so different pre-processing functions are called.

Regarding the post-processing, that can vary more based on the model/network. For example with detection, the bounding boxes and confidences need interpreted. Different detection networks output their detection in different structures. Here is the one for ssd-mobilenet:

https://github.com/dusty-nv/jetson-inference/blob/5718df1495859808d5d076efea597b7e641bcb96/c/detectNet.cpp#L815

Also, before spending much time supporting a new model, you want to check that the ONNX can be loaded with TensorRT using the trtexec tool (found under /usr/src/tensorrt/bin)

t-T-s commented 3 years ago

Hi @dusty-nv, First of all this repository is really awesome. I can't wait to use it for all my upcoming projects.

Prerequisites:

Jetson: [jetson nano (JetPack 4.4.1)]

I tried to run a model I trained using tensorflow. Then I converted the model to ONNX. It went smoothly. I used tf2onnx. Here's the command I used for that. I had to use opset 11 since my model ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8 is from tensorflow model zoo and it had NonMaxSupression (which contributes to my problem) layer in it.

python -m tf2onnx.convert --saved-model "<saved-model-base>\saved" --output "<output-model-base>\ssd-mobilenet.onnx" --opset 11

Then came the problem of unsupported Dtype UINT8. I was also able to resolve this problem. I used onnx-graphsurgeon. Here's the code for that.

import onnx_graphsurgeon as gs
import onnx
import numpy as np

graph = gs.import_onnx(onnx.load("model/onnxmodel/ssd-mobilenet-v2-fpnlite.onnx"))
for inp in graph.inputs:
    inp.dtype = np.float32

onnx.save(gs.export_onnx(graph), "model/onnxmodel/ssd-mobilenet-v2-fpnlite-up.onnx")

Problem:

I also had to replace the NMS layer. This is where I'm stuck at. Here is the code I used to do that. (Sorry about the last edit. I was able to overcome the connection issue. But still the problem persists. Please read on.)

import onnx_graphsurgeon as gs
import onnx
import numpy as np

input_model_path = "model/onnxmodel/ssd-mobilenet-v2-fpnlite_updated.onnx"
output_model_path = "model/onnxmodel/ssd-mobilenet-v2-fpnlite_updated_nms3.onnx"

@gs.Graph.register()
def trt_batched_nms(self, boxes_input, scores_input, nms_output,
                    share_location, num_classes):
    attrs = {
        "shareLocation": share_location,
        "numClasses": num_classes,
        "backgroundLabelId": -1,
        "topK": 1024,
        "keepTopK": 100,
        "scoreThreshold": 0.0001,
        "iouThreshold": 0.6,
        "isNormalized": True,
        "clipBoxes": True
    }
    return self.layer(op="BatchedNMS_TRT", attrs=attrs,
                      inputs=[boxes_input, scores_input],
                      outputs=[nms_output])

graph = gs.import_onnx(onnx.load(input_model_path))
#graph.inputs[0].shape=[1,320,320,3]
print(graph.inputs[0].shape)

for inp in graph.inputs:
    inp.dtype = np.float32

#input = graph.inputs[0]

tmap = graph.tensors()

boxt = "Unsqueeze__695:0"
scores_list = ["Unsqueeze__798:0", "Unsqueeze__764:0", "Unsqueeze__730:0", "Unsqueeze__696:0", "Unsqueeze__662:0", "Unsqueeze__628:0"]
nms_list = ["NonMaxSuppression__800:0", "NonMaxSuppression__766:0", "NonMaxSuppression__732:0", "NonMaxSuppression__698:0", "NonMaxSuppression__664:0", "NonMaxSuppression__630:0"]

def clear_tensors(boxt, scores_list, nms_list):
    tmap[boxt].outputs.clear()
    for score, nms in zip(scores_list, nms_list):
        tmap[score].outputs.clear()
        tmap[nms].inputs.clear()

def replace_op(boxt, scores_list, nms_list):
    for score, nms in zip(scores_list, nms_list):
        graph.trt_batched_nms(tmap[boxt],
                          tmap[score],
                          tmap[nms],
                          share_location=False,
                          num_classes=6)

clear_tensors(boxt, scores_list, nms_list)

replace_op(boxt, scores_list, nms_list)

# Remove unused nodes, and topologically sort the graph.
graph.cleanup()
graph.toposort()
# graph.fold_constants().cleanup()

# Export the ONNX graph from graphsurgeon
#onnx.checker.check_model(gs.export_onnx(graph))
onnx.save_model(gs.export_onnx(graph), output_model_path)

print("Saving the ONNX model to {}".format(output_model_path))

This code runs without any errors. Following image is the orginal model. image

And the following image is the model after adding BatchedNMS_TRT layer. image

However I tried to run the model in trtexec as you mentioned in the above comment. It gives the aborted (core dumped) error. At the stage of registering batchedNMS_TRT layer

[01/25/2021-15:10:56] [V] [TRT] ImporterContext.hpp:141: Registering layer: onnx_graphsurgeon_node_0 for ONNX node: onnx_graphsurgeon_node_0
#assertionbatchedNMSPlugin.cpp,70

Here is the command used in running tensorRT.

/usr/src/tensorrt/bin/trtexec --onnx=/home/mar2/model/onnxmodel/ssd-mobilenet-v2-fpnlite_updated_nms.onnx --verbose

Here is the final part of the logs. (Full log is attached along with all the models)

[01/25/2021-15:10:56] [V] [TRT] ModelImporter.cpp:103: Parsing node: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack__626 [Squeeze]
[01/25/2021-15:10:56] [V] [TRT] ModelImporter.cpp:119: Searching for input: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack:0
[01/25/2021-15:10:56] [V] [TRT] ModelImporter.cpp:125: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack__626 [Squeeze] inputs: [StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack:0 -> (59752, 1, 4)], 
[01/25/2021-15:10:56] [V] [TRT] onnx2trt_utils.cpp:1641: Original shape: (59752, 1, 4), squeezing to: (59752, 4)
[01/25/2021-15:10:56] [V] [TRT] ImporterContext.hpp:141: Registering layer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack__626 for ONNX node: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack__626
[01/25/2021-15:10:56] [V] [TRT] ImporterContext.hpp:116: Registering tensor: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack__626:0 for ONNX tensor: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack__626:0
[01/25/2021-15:10:56] [V] [TRT] ModelImporter.cpp:179: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack__626 [Squeeze] outputs: [StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack__626:0 -> (59752, 4)], 
[01/25/2021-15:10:56] [V] [TRT] ModelImporter.cpp:103: Parsing node: Unsqueeze__695 [Unsqueeze]
[01/25/2021-15:10:56] [V] [TRT] ModelImporter.cpp:119: Searching for input: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack__626:0
[01/25/2021-15:10:56] [V] [TRT] ModelImporter.cpp:125: Unsqueeze__695 [Unsqueeze] inputs: [StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/unstack__626:0 -> (59752, 4)], 
[01/25/2021-15:10:56] [V] [TRT] onnx2trt_utils.cpp:1793: Original shape: (59752, 4), unsqueezing to: (1, 59752, 4)
[01/25/2021-15:10:56] [V] [TRT] ImporterContext.hpp:141: Registering layer: Unsqueeze__695 for ONNX node: Unsqueeze__695
[01/25/2021-15:10:56] [V] [TRT] ImporterContext.hpp:116: Registering tensor: Unsqueeze__695:0 for ONNX tensor: Unsqueeze__695:0
[01/25/2021-15:10:56] [V] [TRT] ModelImporter.cpp:179: Unsqueeze__695 [Unsqueeze] outputs: [Unsqueeze__695:0 -> (1, 59752, 4)], 
[01/25/2021-15:10:56] [V] [TRT] ModelImporter.cpp:103: Parsing node: onnx_graphsurgeon_node_0 [BatchedNMS_TRT]
[01/25/2021-15:10:56] [V] [TRT] ModelImporter.cpp:119: Searching for input: Unsqueeze__695:0
[01/25/2021-15:10:56] [V] [TRT] ModelImporter.cpp:119: Searching for input: Unsqueeze__798:0
[01/25/2021-15:10:56] [V] [TRT] ModelImporter.cpp:125: onnx_graphsurgeon_node_0 [BatchedNMS_TRT] inputs: [Unsqueeze__695:0 -> (1, 59752, 4)], [Unsqueeze__798:0 -> (1, 1, 59752)], 
[01/25/2021-15:10:56] [I] [TRT] ModelImporter.cpp:135: No importer registered for op: BatchedNMS_TRT. Attempting to import as plugin.
[01/25/2021-15:10:56] [I] [TRT] builtin_op_importers.cpp:3659: Searching for plugin: BatchedNMS_TRT, plugin_version: 1, plugin_namespace: 
[01/25/2021-15:10:56] [I] [TRT] builtin_op_importers.cpp:3676: Successfully created plugin: BatchedNMS_TRT
[01/25/2021-15:10:56] [V] [TRT] ImporterContext.hpp:141: Registering layer: onnx_graphsurgeon_node_0 for ONNX node: onnx_graphsurgeon_node_0
#assertionbatchedNMSPlugin.cpp,70
Aborted (core dumped)

Full log text: log.txt Model(original model and after replacing): tf-onnx-trt-mob-fpn3.zip

Can you please help me with this? It's such an optimized way to deploy the model and it would also be a great learning opportunity. With this working I can use detectnet to do the inference. Thank you in advance.

dusty-nv commented 3 years ago

Hi @t-T-s, unfortunately I'm not very familiar with converting detection models from TensorFlow to TensorRT. This is the tool I used previously: https://github.com/AastaNV/TRT_object_detection

However these days I train SSD-Mobilenet in PyTorch (as shown in this repo), then convert it to ONNX and load it with TensorRT. That seems to work more smoothly. Sorry to not be of more help with TensorFlow.

t-T-s commented 3 years ago

Oh dat's alright @dusty-nv . I was also thinking about either going with TF-TRT or PyTorch as you mentioned. Thank you for the repo. It's amazing. I'll try to use the AastaNV's tool. Happy training.

VeeranjaneyuluToka commented 3 years ago

@t-T-s , are you abel to convert your tensorflow ssd-mobilenet-v2-fpnlite model to trt engine?

t-T-s commented 3 years ago

@VeeranjaneyuluToka No. I had to use TF-TRT and move on. Still looking for a solution though.