PINTO0309 / MobileNet-SSD-RealSense

[High Performance / MAX 30 FPS] RaspberryPi3(RaspberryPi/Raspbian Stretch) or Ubuntu + Multi Neural Compute Stick(NCS/NCS2) + RealSense D435(or USB Camera or PiCamera) + MobileNet-SSD(MobileNetSSD) + Background Multi-transparent(Simple multi-class segmentation) + FaceDetection + MultiGraph + MultiProcessing + MultiClustering
https://qiita.com/PINTO
MIT License
365 stars 127 forks source link

About performance #5

Closed bsense-rius closed 6 years ago

bsense-rius commented 6 years ago

Dear "demigod" Pinto,

We recently implemented a multiprocessing version in our own way, pretty similar to yours, but we got some stability issues. In fact we were somehow surprised by the lack of multiprocessing examples in zoo collection and just some multithreading ones for multistick, being the later a bottle neck due to GIL, so no big advantages from multistick at all.

Our approach was based in API v1 with multiprocessing in fork (default) mode. why did you choose forkserver + daemon?

Which is the actual increase in detection fps (not screen rendering) that you achieved by each extra stick added? from 1 to 2, then 3 and finally 4

One final question about performance. the allocate_with_fifos command defaults to 2 elements per each input / output fifo. does it ensure that stick will be all time busy without waiting times? i mean if it allows you to push another item to the graph queue while the previous one is being processed.

Anyway very excellent job on your side!

PINTO0309 commented 6 years ago

@bsense-rius

why did you choose forkserver + daemon?

"fork" was written on the official page that wasteful consumption of resources. And, RaspberryPi is very small resource. I have not verified the stability you are worrying about. It's just a hobby. https://docs.python.org/3.6/library/multiprocessing.html#contexts-and-start-methods

Which is the actual increase in detection fps (not screen rendering) that you achieved by each extra stick added? from 1 to 2, then 3 and finally 4

Sorry. I have not measured it seriously yet. However, if you look at Youtube videos, you can see that they are clearly faster. (1 stick --> 4 sticks) [1stick] https://youtu.be/_Cbt0gI8niQ [4sticks] https://youtu.be/GedDpAc0JyQ

does it ensure that stick will be all time busy without waiting times?

Please look at the following. There are people who verified in detail. https://ncsforum.movidius.com/discussion/comment/2733/#Comment_2733

i mean if it allows you to push another item to the graph queue while the previous one is being processed.

https://ncsforum.movidius.com/discussion/comment/2733/#Comment_2733

bsense-rius commented 6 years ago

great!

i close the issue!

(i was already aware of those comments in forum... but yours was the only implementation i saw working)

PINTO0309 commented 6 years ago

@bsense-rius

1 Stick = 6 FPS 2 Sticks = 12 FPS 3 Sticks = 16.5 FPS 4 Sticks = 16.5 FPS