Open HotelMoted opened 5 years ago
dlib seems to be compiled with cuda, so there's no problem there..
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dlib.cuda as cuda
>>> print(cuda.get_num_devices())
1
What does this return?
>>> import dlib
>>> dlib.DLIB_USE_CUDA
peter@peter-desktop:~$ python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dlib
>>> dlib.DLIB_USE_CUDA
True
>>>
for face detection (face_recognition.face_locations), it doesn't automatically use GPU, you have to change the the model="cnn". Besides, there is a performance issue regarding dlib with concurrency in face detection.
https://github.com/ageitgey/face_recognition/issues/354#issuecomment-366383957
Just curious if anyone managed to improve performance on jetson nano. I am experiencing the same performance here and looking for a workaround.
I am facing the same performance issue on Jetson Nano and looking for a solution.
After using CNN model GPU usage is 100% and CPU is around 20%, but, surprisingly, overall FPS is even worse now, around 1fps instead of 5fps i had before for 320x240 frames.
face_locations = face_recognition.face_locations(rgb_small_frame, model="cnn")
I tried the doorbell_camera.py and I get really bad performance, like 6fps. Is this normal? I was looking at jtop (https://github.com/rbonghi/jetson_stats) and I barely see this thing hit any type of limit. The biggest thing is I see is it pegging an ARM core at 98-100% and the 3 other cores at between 8-20%, other than that its barely touching any of the real power of the jetson nano. Mainly the program is spending it's time at https://gist.github.com/ageitgey/84943a12dd0d9f54e90f824b94e4c2a9#file-doorbell_camera-py-L137 Which is where it populates the face locations in the current frame, could that not be done with a bit more performance using CUDA?