Closed bartonp2 closed 2 years ago
I can confirm that for example_oid.py
I observe the speed up.
285 sec before -> 242 sec after.
Not x100 but something. )
Great work!
I guess it becomes more prominent with more detections per image since it scales quadratically.
For COCO benchmark it's even better: https://github.com/ZFTurbo/Weighted-Boxes-Fusion/tree/master/benchmark
1055 sec vs 643 sec. Speedup almost x2 times. Probably it's because there are more models to ensemble.
I think more vectorization could lead to GPU implementation which will be faster than CPU. My previous attempt with PyTorch failed.
When using weighted boxes fusion with a couple thousand detections it quickly becomes very slow requiring significantly longer than the inference. The bottleneck turned out to be the
find_matching_box
function which is called n*n times for n detections. Vectorisation of this function with numpy speeds up the function by a factor of around 100 and makes the weighted boxes fusion time negligible next to the inference time.