1adrianb / face-alignment

:fire: 2D and 3D Face alignment library build using pytorch
https://www.adrianbulat.com
BSD 3-Clause "New" or "Revised" License
6.88k stars 1.33k forks source link

Is there any way to parallelize the post-processing code in `batch_detect`? #343

Closed SCZwangxiao closed 10 months ago

SCZwangxiao commented 1 year ago

Problem

Hi, I was applying the FaceAlignment module to pre-process the VoxCeleb2 dataset. However, the GPU utilization rate is low (below 10%), and the estimated running time is extremely long (about a month using 8 3090 GPUs).

Potential cause

After profiling the code using pprofile, I found that the bottleneck lies in the post-processing in batch_detect function (code after net(img_batch)). Specifically, it took 51.35% of the total running time, which is detailed below.

image

Appendix of the profiling results

SCZwangxiao commented 11 months ago

I've found the solution, and created a PR in #347.