Open carstenschwede opened 7 years ago
(1) Using MPI model instead of COCO model (2) Using one scale for testing can speed up the processing time. Restricting the detections does not help because the CNN still need to use the same trained model and thus the CNN forwarding processing time is the same.
Another option is to modify the text.prototxt and reduce the stage number from 6 to 3.
Thanks, I will try both. Any idea of what kind of speedup I could expect?
Hi I have a question about the fps is that: I run the rtpose demo on the AWS p2.large instance(with one K80 GPU 24G), however it takes 1.1s to deal a frame. I don't know whether it is because that the k80 gpu has a compute capability of 3.7 lower than that of 6.1 of GTX1080?
These is a preliminary benchmark we have made with the new version we are working on (it will be released in around 1 month). The current version you are using should be around 25-30% slower. Let me know if you are using the same flags. If so, are you using cuDNN 5.1? Older versions of cuDNN might also slow down the program. Thanks!
Current benchmark: https://docs.google.com/spreadsheets/d/1-DynFGvoScvfWDA1P4jDInCkbD4lg0IKOYbXgEq0sK0/edit#gid=0
@Warden7 their compute capabilities K80: 8.73TFLOPS 1080: 9TFLOPS
low fps maybe other reasons
Thanks for your warmly analysis. The version of cuDNN is 5.0 and Cuda is 7.5. The key word of GPU information "volatile gpu util" always shows 99%, even though nothing is done on the GPU.Maybe something debug need to be done further.
@Warden7 kill the processes on the GPU
Another way to speed it up is by using the new version (~25% faster): https://github.com/CMU-Perceptual-Computing-Lab/openpose
Reduce the number of feature maps. I modify the stage 3-6 conv layer's output number from 128 to 64. And the result is as good as original version, speed up 25%!
@wangzhangup Thanks, can you try your modification also on the newer version at https://github.com/CMU-Perceptual-Computing-Lab/openpose? Would be interesting to see what overall speedup you are able to get.
@gineshidalgo99 Thanks for the update!
@wangzhangup Thank you so much for your idea! Please, could you email me: gines@cmu.edu to discuss how you did it in more details? We are interested in adding it to our system if that is OK for you!
@gineshidalgo99 OK!
@gineshidalgo99 @carstenschwede this is the speedup model https://drive.google.com/open?id=0B-SxboVJxF-WNmtpWGc5emZrRDg
@wangzhangup The speed-up is impressive, and the accuracy does decrease a bit, but it is a fine for the huge speedup. Do you mind if I add it to the new OpenPose? (I went from 14 to 20 fps on my desktop and from 30 to 22 mAP). Or you can make a pull request with your new prototxt, and I will fix the other details (so you would appear as contributor of OpenPose). Thanks!
@wangzhangup thanks for the model, impressive speedup!
@gineshidalgo99 is a similar speedup expected for the upcoming "extended" models at OpenPose (e.g. finger tracking)?
@carstenschwede The speed up applies to the body pose, but finger tracking is made on top of it (you need to know the body location to detect the hand), so it will take advantage of it too if this model is used (I did not measure the accuracy impact yet though, I guess I will add both models: 1 for better accuracy and 1 for speed).
I guess I will add both models: 1 for better accuracy and 1 for speed
Sounds perfect. Can't wait to try out the finger detection.
@gineshidalgo99 Could you share your measure code?
It is still quite messy, it uses Matlab and C++, and it is not completely finished. I prefer to wait until I actually finish it properly... sorry!
@carstenschwede The speed up applies to the body pose, but finger tracking is made on top of it (you need to know the body location to detect the hand), so it will take advantage of it too if this model is used (I did not measure the accuracy impact yet though, I guess I will add both models: 1 for better accuracy and 1 for speed).
I just try finger tracking, with option 640x480, also use tracking 5 but fps just around 10fps. May you give an advice?
Are there any options to increase fps besides reducing resolution or adding GPUs? Is it possible to restrict detection to certain joints (e.g. Heads) in order to speed up processing?