dlib is using GPU but the cnn model is still taking too much, something is off somewhere on my setup.

drakorg commented 4 years ago

face_recognition version: 1.2.3
Python version: 3.6.9
Operating System: Ubuntu 18.04.4 LTS Kernel Version: 4.9.140-tegra CUDA 10.2.89

Description

Hi. I've been using the face_recognition library for some time under jetson nano Jetpack 4.3, having a performance of around 500 ms per frame on a 1280 x 720 image using the cnn model, with CUDA support on dlib, and everything working great.

Last night I decided to try Jetpack 4.4 and everything went fine until I saw the performance of the running process. It was around 2000 ms per frame, with the very same setup as before.

The first thing I suspected was that dlib, for some reason, may have not been compiled with CUDA support, but no, that was not the problem, as you can see below.

>>> import face_recognition
>>> face_recognition.__version__
'1.2.3'
>>> import dlib
>>> dlib.DLIB_USE_CUDA
True
>>> dlib.cuda.get_num_devices()
1
>>> dlib.__version__
'19.19.0'
>>>

Not only that, using jtop I can verify that when running the model GPU usage jumps to almost 100% instantly, meaning that the GPU is actually being used. However, the time it takes to process every frame is around 2 full seconds, a lot compared to the 500 ms I was getting just yesterday when running on Jetpack 4.3.

I've run out of ideas on where to look for the problem. Any ideas?

Thank you.

Latestion commented 4 years ago

You can use the model as 'hog'. It might be faster but I am not entirely sure.

drakorg commented 4 years ago

Hi, no, you missed the point.

I'm not looking for alternatives to the cnn detector, I'm just trying to figure out why would the face_recognition lib, which under the hood uses dlib (which was compiled for CUDA, which is present, enabled and even being used according to GPU monitoring tool jtop when running my app), would take 2 full seconds when running in a jetpack 4.4, when on jetpack 4.3 it would take 500ms for the same input, and as far as I know, the same setup.

Since I posted the original question I went back to jetpack 4.3, and as I was saying, throughput is exactly as expected, 500ms per frame. Same input image, same application, same configuration.

I would have expected not to see GPU activity on 4.4 (that would explain why it takes longer), but in fact there is GPU activity (100%), and it's still taking 4x the time it takes on 4.3. I'm just dazzled and trying to find an explanation for it.

On Sat, 2 May 2020 at 01:38, Udit Bansal notifications@github.com wrote:

You can use the model as 'hog'. It might be faster but I am not entirely sure.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ageitgey/face_recognition/issues/1130#issuecomment-622667790, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACK2RE53WXIGZIH22KAIYTLRPOPVDANCNFSM4MXFIVDQ .

mariusmotea commented 4 years ago

Same issue for me

drakorg commented 4 years ago

Hi, @mariusmotea, did you have any luck identifying the cause of the increase in the processing times?

mariusmotea commented 4 years ago

I just follow the tutorial from here. Albeit the tutorial is not that old, it is recommending JetPack 4.2.

ageitgey / face_recognition

dlib is using GPU but the cnn model is still taking too much, something is off somewhere on my setup. #1130

Description