Hi, I was applying the FaceAlignment module to pre-process the VoxCeleb2 dataset. However, the GPU utilization rate is low (below 10%), and the estimated running time is extremely long (about a month using 8 3090 GPUs).
Potential cause
After profiling the code using pprofile, I found that the bottleneck lies in the post-processing in batch_detect function (code after net(img_batch)).
Specifically, it took 51.35% of the total running time, which is detailed below.
Appendix of the profiling results
Profiling results of get_landmarks_from_batch function in FaceAlignment module (Note that self.face_detector.detect_from_batch(image_batch) and self.face_alignment_net(inp)[-1].detach() are the most time-consuming):
Problem
Hi, I was applying the
FaceAlignment
module to pre-process theVoxCeleb2
dataset. However, the GPU utilization rate is low (below 10%), and the estimated running time is extremely long (about a month using 8 3090 GPUs).Potential cause
After profiling the code using pprofile, I found that the bottleneck lies in the post-processing in
batch_detect
function (code afternet(img_batch)
). Specifically, it took 51.35% of the total running time, which is detailed below.Appendix of the profiling results
Profiling results of
get_landmarks_from_batch
function inFaceAlignment
module (Note thatself.face_detector.detect_from_batch(image_batch)
andself.face_alignment_net(inp)[-1].detach()
are the most time-consuming):Profiling results of
detect_from_batch
function (batch_detect
tooks 52.19% of total time).:Profiling results of
batch_detect
function (Note thatnet(img_batch.float())
nearly has not Time. The bottleneck is in the post-processing). :