ageitgey / face_recognition

The world's simplest facial recognition api for Python and the command line
MIT License
53.59k stars 13.51k forks source link

used GPU but got up to 100% on CPU process #1216

Open blinkbink opened 4 years ago

blinkbink commented 4 years ago

Description

Running with flask, i have GPU, dlib installed on GPU : dlib.DLIB_USE_CUDA the response : TRUE

running this code on flask : if len(face_recognition.face_locations(img, model="cnn")) == 0:

then check with nvidia-smi :

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   46C    P0    39W / 250W |  11602MiB / 32510MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|    0      9735      C   python3                                     5082MiB |

i try hit the flask to run face detection, 3 times with same time, got hight process on CPU, i see with command "top" :

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
9735 root      20   0 47.004g 0.011t 2.630g S 116.1 18.4   5:12.98 python3

why this happen ? the process use cpu, not all in GPU ? and the process to hard on cpu, 3 times got percentage up to 100%, any clue ?

running command : cat /proc/cpuinfo | grep processor | wc -l 40

then spec of my cpu :

lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              40
On-line CPU(s) list: 0-39
Thread(s) per core:  2
Core(s) per socket:  10
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
Stepping:            4
CPU MHz:             800.018
CPU max MHz:         3000.0000
CPU min MHz:         800.0000
BogoMIPS:            4400.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            14080K
NUMA node0 CPU(s):   0-9,20-29
NUMA node1 CPU(s):   10-19,30-39
kevdagoat commented 4 years ago

9735 root 20 0 47.004g 0.011t 2.630g S 116.1 18.4 5:12.98 python3

Top is showing that the python process is only using 1 core of the available 40, which makes sense.

diamondbarcode commented 4 years ago

did you try to do " cuda-memcheck filename.py " ? i run into poor performance on my less then 50 line program and it use CPU not GPU like your case too.