Object detector retraining and deployment for Jetson using TLT & TensorRT

undefined-references commented 4 years ago

I tried to generate a TensorRT engine from a retrained SSD-MobileNet-v2 using UFF parser to deploy the model on the Jetson Nano using the Smart Social Distancing App in this repo. The model was retrained using TensorFlow with the latest version of the Object Detection API. I tried to run build_engine.py (hash: 2ea5c04) to construct a TensorRT engine, but during the conversion, some issues were raised because my model was retrained using a newer version of TensorFlow and the Object Detection API. Some of the operations that were replaced with the newer ones weren’t supported by the UFF parser and TensorRT 6.

It was suggested to retrain the model using TensorFlow 1.14 and a specific Object Detection API hash, which was from two years ago. However, I preferred to solve these issues to be able to use the newer version of TensorFlow and the Object Detection API.

To solve the issues, I used the software versions below:

retraining: TensorFlow 1.15, Object Detection API lastest version May 8, hash: 394baa9
conversion: Jetpack 4.3 (TensorRT 6.0.1.10) on Jetson Nano

First, I followed the tensorrt_demos repo installation script, install.sh, to patch graphsurgeon and install PyCUDA.

[TensorRT] ERROR: UffParser: Validator error: FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_4_3x3_s2_256/BatchNorm/FusedBatchNormV3: Unsupported operation _FusedBatchNormV3

[TensorRT] ERROR: UffParser: Validator error: FeatureExtractor/MobilenetV2/expanded_conv_15/add: Unsupported operation _AddV2

I replaced FusedBatchNormv3 with FusedBatchNorm node and AddV2 with Add node using the graphsurgeon update_node function.

[TensorRT] ERROR: UffParser: Validator error: Cast: Unsupported operation _Cast

I found out that there is a ToFloat input operation which is replaced by Cast, so I added a namespace_plugin_map as shown below:

    namespace_plugin_map = {
        "MultipleGridAnchorGenerator": PriorBox,
        "Postprocessor": NMS,
        "Preprocessor": Input,
        "Cast": Input,
        "image_tensor": Input,
        "MultipleGridAnchorGenerator/Concatenate": concat_priorbox,  # for 'ssd_mobilenet_v1_coco'
        "Concatenate": concat_priorbox,  # for other models
        "concat": concat_box_loc,
        "concat_1": concat_box_conf
    }

[libprotobuf FATAL /externals/protobuf/aarch64/10.0/include/google/protobuf/repeated_field.h:1408] CHECK failed: (index) < (current_size_):

The last problem was due to missing an input element for the GridAnchor node. I found a way to work around this problem by defining a constant input tensor and setting that as the input for that node:

def parse_gridAnchor(graph):

    data = np.array([1, 1], dtype=np.float32) 
    anchor_input = gs.create_node("AnchorInput", "Const", value=data)  
    graph.append(anchor_input)
    graph.find_nodes_by_op("GridAnchor_TRT")[0].input.insert(0, "AnchorInput")

    return graph

So, I figured out how to generate a TensorRT engine using the UFF parser from a custom SSD-MobileNet-v2 that is retrained using the latest version of the Object detection API. I have written a blog post explaining the steps in more details, that will be published soon, and I will send a PR for this conversion here shortly.

alpha-carinae29 commented 4 years ago

Thank you @mhejrati for opening this issue so I can share my experience on using TLT, TensorRT and DeepStream with the community. My idea was to train a SSD MobileNet V2 network with Oxford Town Center dateset with the Nvidia's TLT platform. In first step I pulled the TLT docker container from here then followed the sample Jupiter notebook for training SSD models and the Nvidia's documentations to train the detector. Note that to use this platform and using pre-trained models you should have an Nvidia NGC API key which is free. For feeding the data to the TLT you should form your ground-truth annotations to a bunch of KITTI format text files and create a config file which stores the dataset's specifications such as images and labels directory or how you want to split your data to train/eval subsets. Note that TLT does not support dynamic image resizing during training and you should resize the images offline. Also SSD models only accept images with the width and height that are multiplied of 32. So I resized every image to be 320 by 320 and convert each xml annotation to a Kitti format text file and config the data pipeline config file. then I ran the following command to create tfrecord files from the resized images and Kitti annotations.

tlt-dataset-convert -d $SPECS_DIR/ssd_tfrecords_kitti_trainval.txt \
                     -o $DATA_DOWNLOAD_DIR/tfrecords/

Now it was time to download a pre-trained feature extractor. you can list all of the available TLT feature extractor by running this command:

ngc registry model list nvidia/tlt_pretrained_object_detection:*

Since I wanted to train a SSD MobileNet V2 model I chose the MobileNet V2 feature extractor and downloaded its weights by running:

ngc registry model download-version nvidia/tlt_pretrained_object_detection:mobilenet_v2 --dest $USER_EXPERIMENT_DIR/pretrained_mobilenet_v2

Now just another piece have to be configured and it was the training config file which you can find the samples of it in the TLT container. You can specify all of training parameters like batch size or number of epochs in this training config file. Then I just ran following commands to start training:

tlt-train ssd -e $SPECS_DIR/ped_ssd_mobilenet_v2_train.txt \
               -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
               -k $KEY \
               -m $USER_EXPERIMENT_DIR/pretrained_mobilenet_v2/tlt_pretrained_object_detection_vmobilenet_v2/mobilenet_v2.hdf5 \
               --gpus 1

where in -e argument you should specify the training config file, in -r you specify a directory which training weights will be stored and in -m you specify the pre-trained model path.

At the end I used the pruning feature of the TLT which significantly decrease the number of parameters in the network. Applying pruning is specially useful if you want to deploy your model to an edge device like Jetson Nano since they have constraints on bandwidth and memory. running this command will prune the model:

tlt-prune -m $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/ssd_mobilenet_v2_epoch_$EPOCH.tlt \
           -o $USER_EXPERIMENT_DIR/experiment_dir_pruned/ssd_mobilenet_v2_pruned.tlt \
           -eq intersection \
           -pth 0.5 \
           -k $KEY

where in -m you specify one of the trained checkpoints (.tlt files). In -o you specify the pruned model export path. Other arguments are some pruning parameters and you can learn more about them in TLT documentation.

Although pruning will make the model more efficient but it can reduce the accuracy, so I retrained the pruned network to take back the accuracy.

Finally I trained my SSD MobileNet V2 model with 89% mAP.

Some results on evaluation set: 4491 4413

To integrate the trained model with DeepStream and Deployment first you should export the model to a .etlt file by running:

tlt-export ssd -m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/ssd_mobilenet_v2_epoch_$EPOCH.tlt \
                -k $KEY \
                -o $USER_EXPERIMENT_DIR/export/ped_ssd_mobilenet_v2_epoch_$EPOCH.etlt \
                -e $SPECS_DIR/ped_ssd_mobilenet_v2_retrain.txt \
                --batch_size 1 \
                --data_type fp16

The remaining steps of integration and deployment has to be done in the deployment machine. The DeepStream accept two types of model for inference. you can either feed the .etlt file or a TensorRT engine to the Deepstream. For either options you should install TensorRT and TensorRT OOS for SSD models. In next step I exported the etlt model to a TensorRT engine and configured the DeepStream config files to stream the inference. However as I described in #77 I got poor resaults and till now I couldn't fix the issues with DeepStream since I have very poor knowledge on Gstreamer and DeepStream. It would make me happy if someone could solve the DeepStream issues and finally deployed the trained model to various devices.

mhejrati commented 4 years ago

@alpha-carinae29 thanks for sharing the details here, do you mind sharing the code to reproduce what you have done in a separate branch so we can all take a look?

mhejrati commented 4 years ago

@emma-w-dev looking forward to seeing your blog and the PR.

alpha-carinae29 commented 4 years ago

@mhejrati sure, I opened #97 to keep everything there.

galliot-us / neuralet

Object detector retraining and deployment for Jetson using TLT & TensorRT #89