Closed philipp-schmidt closed 5 years ago
Using the pull request in the original repo, which uses numpy to speed up computing, I now get far better performance of 18ms per image post process.
Still, as inference takes 5ms on 2080 TIs, this is quite slow.
@philipp-schmidt Could you convert onnx
files to trt
for tiny-yolov3?
As the original repo unfortunately has issues disabled, I'm gonna ask here in hope someone can share their experiences with this code.
While the inference itself is blazing fast, as you would expect from the tensorrt optimizations, the postprocessing as awfully slow.
As far as I can tell, this is because what the darknet implementation does within the yolo layers is done on cpu in data_processing.py in this implementation. So not only do we have to compute thresholds and non max-suppression on cpu, but also apply all the mask and anchor calculations.
So this makes this whole implementation basically unusable in any practical application. TRT aims to achieve high troughput and low latency, but with the necessary data post-processing in this implementation its probably even faster to use the original darknet python binding implementation...
What @faedtodd reports in his readme are inference times only. If you include all necessary post-processing computations on cpu, you get latencies far south from 500ms.
I would love to hear if others stumbled upon this or if I'm just overseeing or misunderstanding something crucial.