1adrianb / face-alignment

:fire: 2D and 3D Face alignment library build using pytorch
https://www.adrianbulat.com
BSD 3-Clause "New" or "Revised" License
7.02k stars 1.34k forks source link

Other models available other than 4-stacked? #4

Open Yozey opened 7 years ago

Yozey commented 7 years ago

Hello Adrian,

Thanks for sharing your code in pytorch. I noticed that you have probably tested several models with less stacked FAN here.

However, only 4-stacked pretrained model is available to download from your website. I would like to know that if it is possible to share smaller models even though the performance might be a little bit worse.

Thank you very much in advance.

1adrianb commented 7 years ago

Hi @Yozey ,

Yes, I was in the process of converting/retraining the smaller models but most of them are not fully retrained. I will resume this at some point during the next week and add them here. Sorry for the delay.

ggsonic commented 6 years ago

Hi @1adrianb , will the 2dfan-2.pth.tar be available? i want to have a test. or will you please send me a 2-stacked 2dfan model to my gmail ggsonic@gmail.com. Thanks!

korabelnikov commented 6 years ago

Any changes in subject? Does not-stacked model available now?

nightmaredimple commented 6 years ago

I found that the face-alignment process cost 3.5s in 2D alignment and 2.0s in 2D-to-3D alignment.So the 3D alignment process cost almost 5.5s in total.(with Tesla K40c) However,in CPU mode, it cost 2.5s in 2D alignment and 1.4s in 2D-to-3D alignment. Is that normal? PS: if it is possible to share smaller models

1adrianb commented 6 years ago

Hi @nightmaredimple , The entire process is dominated by dlib face detector. In CPU mode dlib uses basically a V-J face detector while in cuda mode a cnn-based detector.The 2D/2.5D detector alone (ignoring the face detection) should be able to run at 20-30fps. A simple way to speedup the process is to use a faster face detector. Another method of speeding this up is using a batch>1.

nightmaredimple commented 6 years ago

@1adrianb Thanks for your reply!I‘ve done a simple test(detect_landmarks_in_image.py),the results as below: In GPU mode: face detection:0.46s 2D alignment:3.22s 2D-to-3D alignment:2.04s Total time:5.8s then the same detected face for the second alignment: 2D alignment:0.08s 2D-to-3D alignment:0.1s so make the alignment twice can speed up!

In CPU mode: face detection:0.15s 2D alignment:2.22s 2D-to-3D alignment:1.10s Total time:3.4s then the same detected face for the second alignment: 2D alignment:2.34s 2D-to-3D alignment:1.355s so the GPU can speed up the alignment process

emmm, maybe my GPU (Tesla K40c) can't make it real-time,but I still appreciate for your help.By the way,is there any more models can be shared except for 2DFAN-4 and 3DFAN-4?

1adrianb commented 6 years ago

Hi @nightmaredimple, that makes sense, the first pass is taking longer due to all initialisation and copying stuff from cpu to cuda. The 20-30 was achieved usign a Titan X Pascal, however a simple way to speed up is to use a 2 stack network (basically 2x improvment with almost no loss in performance), however this will require retraining the network for best performance. Another suggestion is to try to export the network to a static graph (via onnx) wich should hopefully fuse some operations and improve speed. I will probaly push this change at a later point.

Not at this time unfortuantelly.

njho commented 4 years ago

Hi @1adrianb I'd also be interested in seeing the 2 stack network. Any update? That being said, great work!

moncio commented 3 years ago

Hi @1adrianb , any kind of advance in this task? thank you so much

moncio commented 3 years ago

Sorry for being heavy @1adrianb again, but I am very interested to know about this topic if there is any progress. Thanks in advance.

BigTom007 commented 3 years ago

Hi @1adrianb , I'm also very interested about 2 stack network, since 2 stack Hourglass was used in your other paper.