Running this on Cuda 9 gives errors (L4T 28.2.1+ OR host Ubuntu 16.04 with Cuda 9)

Shreeyak commented 6 years ago

I tried running the caffe_ros node on my Host 16.04 with Cuda 9 as well as a jetson on Cuda 9. The model does not execute/pass tests on either. How do I update the code for use with Cuda 9?

Here is the error from Jetson TX2 (flashed with Jetpack 3.2.1):

$ rostest caffe_ros tests_basic.launch test_data_dir:=$REDTAIL_TEST_DIR model_dir:=$REDTAIL_MODEL_DIR
... logging to /home/nvidia/.ros/log/rostest-tegra-ubuntu-9855.log
[ROSUNIT] Outputting test results to /home/nvidia/.ros/test_results/caffe_ros/rostest-tests_tests_basic.xml
caffe_ros_node: cudnnEngine.cpp:640: bool nvinfer1::cudnn::Engine::deserialize(const void*, std::size_t, nvinfer1::IPluginFactory*): Assertion `size >= bsize && "Mismatch between allocated memory size and expected size of serialized engine."' failed.
caffe_ros_tests: /usr/include/boost/smart_ptr/shared_ptr.hpp:641: typename boost::detail::sp_dereference<T>::type boost::shared_ptr<T>::operator*() const [with T = const sensor_msgs::Image_<std::allocator<void> >; typename boost::detail::sp_dereference<T>::type = const sensor_msgs::Image_<std::allocator<void> >&]: Assertion `px != 0' failed.
[Testcase: testCaffeRosTests] ... ERROR!
ERROR: max time [300.0s] allotted for test [CaffeRosTests] of type [caffe_ros/caffe_ros_tests]
  File "/usr/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rostest/runner.py", line 148, in fn
    self.test_parent.run_test(test)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rostest/rostest_parent.py", line 132, in run_test
    return self.runner.run_test(test)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/roslaunch/launch.py", line 684, in run_test
    (test.time_limit, test.test_name, test.package, test.type))
--------------------------------------------------------------------------------

[ROSTEST]-----------------------------------------------------------------------

SUMMARY
 * RESULT: FAIL
 * TESTS: 0
 * ERRORS: 1
 * FAILURES: 0

$ rosrun caffe_ros caffe_ros_node __name:=yolo_dnn _prototxt_path:=$REDTAIL_MODEL_DIR/yolo-relu.prototxt _model_path:=$REDTAIL_MODEL_DIR/yolo-relu.caffemodel _output_layer:=fc25
[ INFO] [1534431180.311209944]: Starting Caffe ROS node...
[ INFO] [1534431180.340910862]: Camera: /camera/image_raw
[ INFO] [1534431180.341055437]: Proto : /home/nvidia/redtail/models/pretrained/yolo-relu.prototxt
[ INFO] [1534431180.341120908]: Model : /home/nvidia/redtail/models/pretrained/yolo-relu.caffemodel
[ INFO] [1534431180.341162348]: Input : data
[ INFO] [1534431180.341220076]: Output: prob
[ INFO] [1534431180.341271627]: In Fmt: BGR
[ INFO] [1534431180.341370283]: DType : fp16
[ INFO] [1534431180.341422378]: Scale : 1.0000
[ INFO] [1534431180.341467754]: Shift : 0.00
[ INFO] [1534431180.341523018]: Cam Q : 1
[ INFO] [1534431180.341576841]: DNN Q : 1
[ INFO] [1534431180.341621449]: Post P: none
[ INFO] [1534431180.341692873]: Obj T : 0.15
[ INFO] [1534431180.341735624]: IOU T : 0.20
[ INFO] [1534431180.341774312]: Rate  : 30.0
[ INFO] [1534431180.341854759]: Debug : no
[ INFO] [1534431180.341901863]: INT8 calib src  : 
[ INFO] [1534431180.341946311]: INT8 calib cache: 
[ WARN] [1534431180.341995238]: The use_FP16 parameter is deprecated though still supported. Please use data_type instead as use_FP16 will be removed in future release.
[ INFO] [1534431180.369515179]: Hardware support of fast FP16: yes.
[ INFO] [1534431180.369587562]: Hardware support of fast INT8: no.
[ INFO] [1534431180.369645962]: Using FP16 model data type.
[ INFO] [1534432023.466124098]: Loaded model from: /home/nvidia/redtail/models/pretrained/yolo-relu.prototxt, /home/nvidia/redtail/models/pretrained/yolo-relu.caffemodel
[ INFO] [1534432023.466265536]: Building CUDA engine...
<stuck here>

This is the error code from my host (Ubuntu 16.04, TensorRT 4.0, Cuda 9):

$ rosrun caffe_ros caffe_ros_node __name:=yolo_dnn _prototxt_path:=$REDTAIL_MODEL_DIR/yolo-relu.prototxt _model_path:=$REDTAIL_MODEL_DIR/yolo-relu.caffemodel _output_layer:fc25
[ INFO] [1534431180.311209944]: Starting Caffe ROS node...
[ INFO] [1534431180.340910862]: Camera: /camera/image_raw
[ INFO] [1534431180.341055437]: Proto : /home/nvidia/redtail/models/pretrained/yolo-relu.prototxt
[ INFO] [1534431180.341120908]: Model : /home/nvidia/redtail/models/pretrained/yolo-relu.caffemodel
[ INFO] [1534431180.341162348]: Input : data
[ INFO] [1534431180.341220076]: Output: prob
[ INFO] [1534431180.341271627]: In Fmt: BGR
[ INFO] [1534431180.341370283]: DType : fp16
[ INFO] [1534431180.341422378]: Scale : 1.0000
[ INFO] [1534431180.341467754]: Shift : 0.00
[ INFO] [1534431180.341523018]: Cam Q : 1
[ INFO] [1534431180.341576841]: DNN Q : 1
[ INFO] [1534431180.341621449]: Post P: none
[ INFO] [1534431180.341692873]: Obj T : 0.15
[ INFO] [1534431180.341735624]: IOU T : 0.20
[ INFO] [1534431180.341774312]: Rate  : 30.0
[ INFO] [1534431180.341854759]: Debug : no
[ INFO] [1534431180.341901863]: INT8 calib src  : 
[ INFO] [1534431180.341946311]: INT8 calib cache: 
[ WARN] [1534431180.341995238]: The use_FP16 parameter is deprecated though still supported. Please use data_type instead as use_FP16 will be removed in future release.
[ INFO] [1534431180.369515179]: Hardware support of fast FP16: yes.
[ INFO] [1534431180.369587562]: Hardware support of fast INT8: no.
[ INFO] [1534431180.369645962]: Using FP16 model data type.
[ INFO] [1534431181.729222169]: Loaded model from: /home/nvidia/redtail/models/pretrained/yolo-relu.prototxt, /home/nvidia/redtail/models/pretrained/yolo-relu.caffemodel
[FATAL] [1534431181.729329304]: Could not find output blob: prob
Segmentation fault (core dumped)

$ rosrun caffe_ros caffe_ros_node __name:=yolo_dnn _prototxt_path:=$REDTAIL_MODEL_DIR/yolo-relu.prototxt _model_path:=$REDTAIL_MODEL_DIR/yolo-relu.caffemodel _output_layer:fc25 _input_layer:=data
'[ INFO] [1534421903.901773214]: Starting Caffe ROS node...
[ INFO] [1534421903.911961376]: Camera: /camera/image_raw
[ INFO] [1534421903.911987688]: Proto : home/redtail/models/pretrained/yolo-relu.prototxt
[ INFO] [1534421903.911993552]: Model : home/redtail/models/pretrained/yolo-relu.caffemodel
[ INFO] [1534421903.911998951]: Input : data
[ INFO] [1534421903.912005368]: Output: prob
[ INFO] [1534421903.912010269]: In Fmt: BGR
[ INFO] [1534421903.912028457]: DType : fp16
[ INFO] [1534421903.912041636]: Scale : 1.0000
[ INFO] [1534421903.912047490]: Shift : 0.00
[ INFO] [1534421903.912053366]: Cam Q : 1
[ INFO] [1534421903.912059122]: DNN Q : 1
[ INFO] [1534421903.912066203]: Post P: none
[ INFO] [1534421903.912072897]: Obj T : 0.15
[ INFO] [1534421903.912078844]: IOU T : 0.20
[ INFO] [1534421903.912085568]: Rate  : 30.0
[ INFO] [1534421903.912090249]: Debug : no
[ INFO] [1534421903.912095297]: INT8 calib src  : 
[ INFO] [1534421903.912101412]: INT8 calib cache: 
[ WARN] [1534421903.912107801]: The use_FP16 parameter is deprecated though still supported. Please use data_type instead as use_FP16 will be removed in future release.
[ INFO] [1534421904.064800437]: Hardware support of fast FP16: no.
[ INFO] [1534421904.064828065]: Hardware support of fast INT8: yes.
[ INFO] [1534421904.064834117]: ... however, INT8 will not be used for this model.
[ INFO] [1534421904.064839374]: Using FP32 model data type.
[ INFO] [1534421904.064864704]: [TensorRT] CaffeParser: Could not open file home/redtail/models/pretrained/yolo-relu.caffemodel
[ INFO] [1534421904.064879920]: [TensorRT] CaffeParser: Could not parse model file
[FATAL] [1534421904.064888080]: Failed to parse network: home/redtail/models/pretrained/yolo-relu.prototxt, home/redtail/models/pretrained/yolo-relu.caffemodel

Alexey-Kamenev commented 6 years ago

For the first rosrun command - if you are running it on Jetson, the first time it might take up to 3-5 minutes for TensorRT to compile the model. Just wait a bit more... The second rosrun command contains a typo at the end: _output_layer:fc25 (missing =) - so the argument is ignored and default output layer name, prob, is used as reported by the node. The third rosrun command does not have leading / REDTAIL_MODEL_DIR env variable, so the current path used instead which is probably not what you want.

As for the tests - check your paths etc.

Shreeyak commented 6 years ago

Aah, how did I miss those! Thank you for the quick response, @Alexey-Kamenev ! I just made the changes and they seem to go beyond the error stage! I'm getting this on my PC (Cuda 9, TensorRT 4.0):

$ ./launch_caffe_ros.sh
...
[ INFO] [1534447173.129896414]: Loaded model from: /home/shrek/stroll-e/redtail/models/pretrained/yolo-relu.prototxt, /home/shrek/stroll-e/redtail/models/pretrained/yolo-relu.caffemodel
[ INFO] [1534447173.129936623]: Building CUDA engine...
[ INFO] [1534447211.052500550]: Done building.
[ INFO] [1534447211.065483520]: Saving cached model to: /home/shrek/stroll-e/redtail/models/pretrained/yolo-relu.caffemodel.cache
[ INFO] [1534447211.483300622]: Created CUDA engine and context.
[ INFO] [1534447211.483328679]: Input : (W: 448, H: 448, C:   3).
[ INFO] [1534447211.483883560]: Output: (W:   1, H:   1, C:1470).

$ rostopic list
/camera/image_raw
/rosout
/rosout_agg
/yolo_dnn/network/output

This is the script I'm using to launch:

rosrun caffe_ros caffe_ros_node __name:=yolo_dnn \
_prototxt_path:=/home/shrek/redtail/models/pretrained/yolo-relu.prototxt \
_model_path:=/home/shrek/redtail/models/pretrained/yolo-relu.caffemodel \
_input_layer:=data \
_output_layer:=fc25

Thank you! It's running on both the TX2 and on my PC. TX2 took a few minutes to create the CUDA engine, as you said. Any idea if the model is supposed to work on Cuda 9, TensorRT 4.0? I've been thinking of going with Tensorflow of simplicity of development, but maybe I just need to learn DIGITS/caffe.

Alexey-Kamenev commented 6 years ago

Current redtail code tested and confirmed to work in the following configurations:

JetPack 3.2 which contains CUDA 9.0 and TensorRT 3.0.
JetPack 3.3 which contains CUDA 9.0 and TensorRT 4.0.

As for Caffe vs TensorFlow: the reason we had to use Caffe is back in 2016 when we started the project, Caffe was the only framework supported by TensorRT. These days you can load pretty much any model in TRT (e.g. using ONNX parser or TF-TRT etc). I would definitely use TF or pytrorch for any new project.

Shreeyak commented 6 years ago

Thank you, this is great news! Also appreciate the inputs on tensorflow. I will try to get something up in tensorflow soon and share information.

At this point, would you have any suggestions for models for segmentation? I'm considering SegNet or Unet at the moment. I was wondering if you'd be aware of any other tensorflow implementations for semantic segmentation that would be suited for the TX2.

Alexey-Kamenev commented 6 years ago

For semantic segmentation, I would recommend checking out MobileNet-based DeepLab nets - they should work well on TX2. However, I haven't tried that so I cannot say for sure. Our team is also working on semantic/instance segmentation optimized for Jetson but we are not ready to release our work yet.

Shreeyak commented 6 years ago

Thank you, I truly appreciate your inputs!

Alexey-Kamenev commented 6 years ago

Closing, feel free to re-open or create a new one.

vxgu86 commented 5 years ago

same problem what you did not notice is that there is an Assertion error ,which I embraced.

I have set the time limit to 1000, still no results got out.

rostest caffe_ros tests_basic.launch test_data_dir:=$REDTAIL_TEST_DIR model_dir:=$REDTAIL_MODEL_DIR trail_prototxt_path:=$REDTAIL_MODEL_DIR/TrailNet_SResNet-18.prototxt trail_model_path:=$REDTAIL_MODEL_DIR/TrailNet_SResNet-18.caffemodel object_prototxt_path:=$REDTAIL_MODEL_DIR/yolo-relu.prototxt object_model_path:=$REDTAIL_MODEL_DIR/yolo-relu.caffemodel ... logging to /home/nvidia/.ros/log/rostest-tegra-ubuntu-6698.log [ROSUNIT] Outputting test results to /home/nvidia/.ros/test_results/caffe_ros/rostest-tests_tests_basic.xml caffe_ros_tests: /usr/include/boost/smart_ptr/shared_ptr.hpp:641: typename boost::detail::sp_dereference::type boost::shared_ptr::operator*() const [with T = const sensormsgs::Image<std::allocator >; typename boost::detail::sp_dereference::type = const sensormsgs::Image<std::allocator >&]:

///////////////////////////////////////////////////////////// Assertion `px != 0' failed. //////////////////////////////////////////////

[Testcase: testCaffeRosTests] ... ERROR! ERROR: max time [600.0s] allotted for test [CaffeRosTests] of type [caffe_ros/caffe_ros_tests] File "/usr/lib/python2.7/unittest/case.py", line 329, in run testMethod() File "/opt/ros/kinetic/lib/python2.7/dist-packages/rostest/runner.py", line 148, in fn self.test_parent.run_test(test) File "/opt/ros/kinetic/lib/python2.7/dist-packages/rostest/rostest_parent.py", line 132, in run_test return self.runner.run_test(test) File "/opt/ros/kinetic/lib/python2.7/dist-packages/roslaunch/launch.py", line 684, in run_test (test.time_limit, test.test_name, test.package, test.type))

[ROSTEST]-----------------------------------------------------------------------

SUMMARY

RESULT: FAIL
TESTS: 0
ERRORS: 1
FAILURES: 0

rostest log file is in /home/nvidia/.ros/log/rostest-tegra-ubuntu-6698.log

vxgu86 commented 5 years ago

@Alexey-Kamenev any suggestions for my question? thanks

NVIDIA-AI-IOT / redtail

Running this on Cuda 9 gives errors (L4T 28.2.1+ OR host Ubuntu 16.04 with Cuda 9) #80