Face alignment speed up and GPU usage

Wenlong0913 commented 5 years ago

To whom it may concern,

This repo provided really amazing tools. Thanks for the great work. I tried face alignment, extract features by using this lib. I found the face alignment may cost 1.3s to process an image. After reading the code, I realized the mtcnn is not running on GPU. A a little bit changes were made, e.g., torch.FloatTensor => torch.cuda.FloatTensor, Pnet() =>Pnet().cuda(), etc.

This increased the face alignment speed per image from 1.3 to 0.8s. It works, however, the result does not make me satisfied. Is there a way to make the face detection/alignment run faster?

There is another thing make me confused. The GPU usage is very low, 1%~2%. Please see the attachments.

I'm not sure if this is due to I didn't configured the GPU properly or it is just one of the advantages of this library. The installed CUDA version is 9.2, Cudnn version is 7.4. Graphic card is RTX 2070. It reports an error after I run the python code. Can anyone tell me how to fix it?

Again, many thanks for the great work!

jolinlinlin commented 5 years ago

@Wenlong0913 hello,I also want to improve the speed ,but there is poor promotion after using cuda. Have you find better method to accelerate?If so,can you please tell me?Thank you very much.

Wenlong0913 commented 5 years ago

@Wenlong0913 hello,I also want to improve the speed ,but there is poor promotion after using cuda. Have you find better method to accelerate?If so,can you please tell me?Thank you very much.

Nope. Perhaps you can try this implementation. It's faster. https://github.com/Seanlinx/mtcnn

flyingmrwang commented 4 years ago

@Wenlong0913 I am facing the same problem. Could you give some hints on how much it is faster than this one? And do you think the key point of low gpu usage is mtcnn itself or in-efficient implementation? Also, in your provided mtcnn, there is no prediction of landmarks. How do you utilize it to align face?

innerlee commented 4 years ago

Check out this fork: https://github.com/innerlee/face.evoLVe.PyTorch

Speed Comparison

original

In [1]: from PIL import Image 
   ...: from detector import detect_faces                                                                

In [2]: img = Image.open('../disp/Fig1.png').convert('RGB')                                                             

In [3]: %time detect_faces(img)                                                                                         
CPU times: user 2.85 s, sys: 172 ms, total: 3.02 s
Wall time: 610 ms

the fork

In [1]: from PIL import Image                                                                                           

In [2]: from evolveface import detect_faces, show_results                                                               

In [3]: img = Image.open('disp/Fig1.png').convert('RGB')

In [4]: %time detect_faces(img)                                                                                         
CPU times: user 255 ms, sys: 6.05 ms, total: 261 ms
Wall time: 42.3 ms

xxxpsyduck commented 4 years ago

@innerlee so it's all about pillow-simd?

innerlee commented 4 years ago

There are lots of code optimization also

xxxpsyduck commented 4 years ago

@innerlee Does it affect model performance?

innerlee commented 4 years ago

Purely speed changes. The bottleneck is not model inference.

xxxpsyduck commented 4 years ago

@innerlee I'm checking it out. Great work

YINDAIYING commented 4 years ago

@innerlee Hello, can u share what's the estimated time for training this repo using full dataset of Celeb-1M? I tried using 4 GPU but my estimated time is too long like 200 days for a single batch. I cannot believe it. After ten hours of training using only 1/3 of data, it's still at the first epoch with batch 1920/411750. Can u share ur training status? Thx

innerlee commented 4 years ago

I use the provided weights for inference. Haven't tried training :shrug:

PatrickPrakash commented 3 years ago

I also have this issue, I'm running the dataset on Tesla K80

ZhaoJ9014 / face.evoLVe