Closed vinayak618 closed 5 years ago
Hi @vinayak618, This is strange indeed. How are you measuring this? Please note that the first image passed will be significantly slower since the network will copy and pytorch will initialize buffers internally.
Hi @1adrianb,
Once the models are downloaded and setup is complete. I'm using your examples folder script and images to get the predictions for both 2D and 3D for SFD face detector. Is there any wrong i'm doing.?
I was referring to the fact that the initial call to get_landmarks
will be slower.
I am afraid I am unable to tell without having a code sample.
Can you also check your GPU usage during the detection/training?
Hi @1adrianb,
I ran the code again and observed upto 6GiB of GPU usage on my machine with 8GiB 1070 GeForce GTX. Still observing faster prediction in CPU, NO idea why.
Below is the code snippet i used as it is from your example folder test script to get the predictions only.
start_time = time.time() fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, face_detector='sfd', device='cpu') input = io.imread('../test/assets/aflw-test.jpg') preds = fa.get_landmarks(input) print("---` %s seconds ---" % (time.time() - start_time))
i've tested in real time resizing my input image (1024, 1024) and changing the face detector, makes a really great work in time.
fa.get_landmarks(input)
100 times for example, excluding the first run which is "warming up" the network.fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, face_detector='sfd', device='cpu')
. This is supposed to be run only once in your application anyway. If you will do all of this I am sure the GPU will be significantly faster.
@Reddyforcode, yes the speed of the face detector will depend on the size of the face. The face alignment part is however independent of that.
Hi @1adrianb,
yeah, i understood it now. Thanks for that. So the first call from GPU takes longer time then CPU in order to copy and initialize the data. I ran the detector and got the predictions with a loop of 100 and observed GPU is quite faster. And any idea how can i add synchronize call (I haven't quite worked more on CUDA kernels).
@vinayak618 please see https://pytorch.org/docs/stable/cuda.html#torch.cuda.synchronize
Hi @1adrianb .
I was bench marking your latest Pytorch source code for both 2D and 3D landmark detection with SFD face detector, I'm observing about 10x faster speed in CPU w.r.t to GPU, which is strange. Any help here would be appreciated.
CPU - Intel i9, 9th Generation Machine. GPU - GTX GeForce 1070 8GiB.
Thanks and Regards, Vinayak