fandulu / DD-Net

A lightweight network for body/hand action recognition
MIT License
265 stars 53 forks source link

Frame Rate - Real Time Prediction #10

Closed leviethung2103 closed 4 years ago

leviethung2103 commented 4 years ago

HI @fandulu ,

I would like to ask you about my problem.

I've got the camera that only produces the 5 FPS video files. In comparison with JHMDB dataset, it's about 32 frame lengths. Therefore, base on your wide range of knowledge, is it a serious problem when applying your model to my data?

I've tried to use the Open Pose so that I can the joints information and then preprocessing as inputs to your model. However, I got a really bad performance at around 50% accuracy.

Furthermore, I would like to make your model become a real-time prediction with window = 8. However, I don't know what is suitable values for setting up. Do you have any advice?

Thank you so much for reading my silly questions. I hope that I can receive valuable suggestions from your.

fandulu commented 4 years ago

Hi, I think there could be three reasons for bad performance: (1) The good action prediction relies on good pose estimation but pose estimation usually contains a lot of noise; (2) The actions defined by others (e.g., JHMDB) may not suitable for your desired action; (3) In wild body-related (not hands) action recognition case, RGB information is quite useful and it significantly improves the action prediction performance, it may be better to just partially depend on body skeleton.

If you only have one person in the video, you may find some ideas at https://www.pyimagesearch.com/2019/07/15/video-classification-with-keras-and-deep-learning/. If you have multiple people, I personally think it might be better to follow AVA detection related works at https://research.google.com/ava/. If you are working on hand gestures, I think DD-Net should be helpful, even the camera may produce the 5 FPS video files, you can use frames in the past few seconds and with a little prediction delay.