1adrianb / face-alignment

:fire: 2D and 3D Face alignment library build using pytorch
https://www.adrianbulat.com
BSD 3-Clause "New" or "Revised" License
6.88k stars 1.33k forks source link

Processing speed discrepancy #277

Closed connormeaton closed 3 years ago

connormeaton commented 3 years ago

Thank you for open sourcing this code. I am interested in the landmark detection. I am using it on an AWS ec2 (p2.xlarge with 8 NVIDIA K80 GPUs). All I'm running is the face detector and landmark function. It looks like this:

import face_alignment
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from skimage import io
import collections
import time

time1 = time.time()

# Optionally set detector and some additional detector parameters
face_detector = 'sfd'
face_detector_kwargs = {
    "filter_threshold" : 0.8
}

# Run the 3D face alignment on a test image, without CUDA.
fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._3D, device='cuda', flip_input=False,
                                  face_detector=face_detector, face_detector_kwargs=face_detector_kwargs)

try:
    input_img = io.imread('5.jpg')
except FileNotFoundError:
    input_img = io.imread('test/assets/aflw-test.jpg')

preds = fa.get_landmarks(input_img)[-1]
print(preds)
time2 = time.time()
print(time2-time1)

When I print the time, I'm getting between 9-12 seconds to predict landmarks on 1 image. I have tried changing the device='' parameter to 'cpu' and 'cuda', but its not going any faster. I see from other issues that you expect to be able to predict landmarks on 30 images per second on GPU. How can I do this?

Thanks.

1adrianb commented 3 years ago

Sorry for delay, the 30fps were not including the detector and the use case is very suboptimal.

The first forward pass will be significantly slower as the network will initialize, load the models etc. You should: a) create the fa model a single time, then reuse it in the subsequent calls to .get_landmarks() Example of pseudo-code:

fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._3D, device='cuda', flip_input=False,
                                  face_detector=face_detector, face_detector_kwargs=face_detector_kwargs)
# warmup
_ = preds = fa.get_landmarks(all_images[0])

start = time.time()
for img in all_images:
    preds = fa.get_landmarks(img)
end = time.time()
print((end-start)/len(all_images))

Note that time library is not ideal for measuring this, but it should be sufficiently accurate in this case.

b) Depending on the resolution of your image and the face_detector used, the speed will vary, for example 'sfd' is relatively slow, so you could try using 'blazeface'