Closed SCZwangxiao closed 10 months ago
The unitest test/facealignment_test.py
has failed, but it succeeded in my env. That's strange.
Hi, @SCZwangxiao I got a 10% boost in performance using V1. V2 throws an error for me about thr.
TypeError: get_predictions() missing 1 required positional argument: 'thr'
Any thoughts on how I can get that working?
Hi, @SCZwangxiao I got a 10% boost in performance using V1. V2 throws an error for me about thr.
TypeError: get_predictions() missing 1 required positional argument: 'thr'
Any thoughts on how I can get that working?
Sorry for the typo. I've update the correct version of V2 code.
thr
refers to the 0.05
in poss = zip(*np.where(ocls[:, 1, :, :] > 0.05))
. We use thr
to filter low-confidence candidates in our private project.
Thanks. It works now! I'm getting slightly faster results with v1 than v2. They're both faster than the original.
Thanks for your contribution @SCZwangxiao , looks good! Will check what is going on with the test, seams to be fine locally indeed.
Background
I was processing large-scale human-talking datasets (~ 10M images), and found the GPU utilization rate is very low (below 10%) even using batch API.
As discussed in #343, I found the bottleneck to be the unparallelized
get_predictions()
after profiling the code.I've solved this issue by proposing a parallelized implementation. A detailed explanation is below.
Explanation
Note that the batch index
j
appears only in the inner loop, we exchange for-loop order to make things clearer:Finally, it's straightforward that the
batch_size
loop can be parallelized: