Daniil-Osokin / lightweight-human-pose-estimation-3d-demo.pytorch

Real-time 3D multi-person pose estimation demo in PyTorch. OpenVINO backend can be used for fast inference on CPU.
Apache License 2.0
656 stars 138 forks source link

Implement on raspberry pi camera #41

Closed Fan-loewe closed 3 years ago

Fan-loewe commented 3 years ago

Hello,

firstly thank you for sharing the code. I would like to implement that on Jetson Nano with a raspberry pi camera. However I met the following errors:

$ python demo.py --model human-pose-estimation-3d.pth --video 0 [ WARN:0] global /home/nvidia/host/build_opencv/nv_opencv/modules/videoio/src/cap_gstreamer.cpp (1757) handleMessage OpenCV | GStreamer warning: Embedded video playback halted; module v4l2src0 reported: Internal data stream error. [ WARN:0] global /home/nvidia/host/build_opencv/nv_opencv/modules/videoio/src/cap_gstreamer.cpp (886) open OpenCV | GStreamer warning: unable to start pipeline [ WARN:0] global /home/nvidia/host/build_opencv/nv_opencv/modules/videoio/src/cap_gstreamer.cpp (480) isPipelinePlaying OpenCV | GStreamer warning: GStreamer: pipeline have not been created

I suppose the "video" is for webcam, do you have any idea how I could implement on raspberry pi camera? Thank you in advance, Fan

Daniil-Osokin commented 3 years ago

Hi! I believe you should use non numeric id to access the camera on this device, as described here. So just try to open VideoCapture with the suggested parameters.

Fan-loewe commented 3 years ago

Hi Daniil thank you for your reply! I changed modules/input_reader.py to read from gst-streamer, now it works. But there is still a problem. The delay is extremely large, and FPS is around 3.4, so it could not capture the human behaviors very well. Do you have any idea that the problem is due to the hardware Jetson Nano or I need a faster network? Thanks a lot!

Daniil-Osokin commented 3 years ago

Great that it is started to work! Did you try the demo on PC/other device? If you did not build pose_extractor, the post-processing may be slow in scenes with multiple persons. I would expect better FPS on Jetson Nano, which utilization shows nvidia-smi (3rd column)?

Fan-loewe commented 3 years ago

Hi Daniil, thank you for the message. I did not try it on other devices yet. I built pose extractor but it still has high delay. I suppose nvidia-smi cannot use in jetson nano, I used jtop to monitor the usage of GPU. Screenshot As you can see, the utilization of GPU is quite high, does that mean such FPS is already the limit of Jetson Nano?

Daniil-Osokin commented 3 years ago

You can try to check the load from just the camera pipeline. So comment out lines 96-97 and set instead:

poses_3d, poses_2d = [], []
Fan-loewe commented 3 years ago

Hi Daniil, thank you for your advice! It runs much faster with FPS 180, but it could not detect people anymore. What are these lines used for? Also, there is a Tensor RT model on Jetson Nano, I think it make sense if I transfer pytorch to tensorflow with torch2trt. Do you think it is plausible? Thanks a lot!

Daniil-Osokin commented 3 years ago

This experiment shown, that camera pipeline works fast, and looks like that the network inference mostly loads the gpu. Using Tensor RT may help to run it faster. You also can try to infer network on smaller image, just pass --height-size 192 to the demo.py.

Fan-loewe commented 3 years ago

Hi Daniil, sorry for my late response. I tried the image size, it cannot help too much. I suppose it make sense if I can combine your library and tensorRT.

According to the demo in trt_pose https://github.com/NVIDIA-AI-IOT/trt_pose/blob/master/tasks/human_pose/live_demo.ipynb

I add following codes in line 58 of demo.py WIDTH = 224 HEIGHT = 224 data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda() model_trt = torch2trt.torch2trt(net.net, [data], fp16_mode=True, max_workspace_size=1<<25)

it shows me the following errors: Traceback (most recent call last): File "demo_1.py", line 69, in model_trt = torch2trt.torch2trt(net.net, [data], fp16_mode=True, max_workspace_size=1<<25) File "/home/aegis/.virtualenvs/py3cv4/lib/python3.6/site-packages/torch2trt-0.1.0-py3.6-linux-aarch64.egg/torch2trt/torch2trt.py", line 540, in torch2trt ctx.mark_outputs(outputs, output_names) File "/home/aegis/.virtualenvs/py3cv4/lib/python3.6/site-packages/torch2trt-0.1.0-py3.6-linux-aarch64.egg/torch2trt/torch2trt.py", line 406, in mark_outputs trt_tensor = torch_output._trt AttributeError: 'list' object has no attribute '_trt'

I think the error appears because torch2trt does not support mobilenet.

Thus, could you tell me how to apply resnet to use your library? An alternative is that I could combine your 3d parts with the trt_pose library, but I am not sure if that is plausible, since 2D and 3D joints are trained jointly.

Thanks a lot for the help!

Daniil-Osokin commented 3 years ago

Hi! Please, check the support of TensorRT in #43 (observed ~10x speedup for network inference :fire:).

Fan-loewe commented 3 years ago

Hi Daniil, thank you so much for your support!! The converter works, and the speed is almost doubled by Nano. But it is not able to detect person anymore. Do you have any idea?

Screenshot from 2020-10-27 11-54-35

Can you still detect person with TensorRT?

Fan-loewe commented 3 years ago

Just an update, I changed the line of 12 in convert_to_trt.py to: input = torch.randn(1,3,192,192).cuda() and it works.

I noticed the size might have problem because: When not using tensorRT, it is able to estimate the human pose. After calling net.infer, size of the features, heatmaps and pafs are [1, 57, 32, 32], [1, 19, 32, 32], [1, 38, 32, 32] with the height-size 256. When using tensorRT, after net.infer, size of the features, heatmaps and pafs are [1, 57, 32, 56], [1, 19, 32, 56], [1, 38, 32, 56].

The original input in the converter is input = torch.randn(1,3,256,448).cuda(), which means 32 and 56 are obtained after 1/8 downsampled. If we want the 3rd and 4th entry of the features to be the same, the 3rd and 4th entry of input also should be the same. Since I want to use the height size 192, so that I set both as 192.

I still have a doubt, why the size of heatmaps should be [1, 57, 32, 32]? I mean only in this way it is able to detect people. However, when I read openpose paper, I thought the size is [1, 57, 32, 56]. Did you do some data processing after we get the raw data?

Thanks a lot!

Daniil-Osokin commented 3 years ago

Hi, you have debugged it right! The issue is that TensorRT does not support dynamic network input size reshape. That is why tensors shapes differ between PyTorch and TensorRT inference engines. So you should set proper size of network input at time of conversion (in the conversion script). Default values work for input with 16:9 aspect ratio. To get the network input size just print(scaled_img.shape) at line 95 in demo.py. I have added clarification message into PR.

Fan-loewe commented 3 years ago

Hi Daniil, thanks for the clarification!! I appreciate your help a lot. So closing the issue

Daniil-Osokin commented 3 years ago

Great, that it works!