Closed Phyrokar closed 1 year ago
Hi @Phyrokar. Please refer to this guide: https://docs.deci.ai/super-gradients/documentation/source/BenchmarkingYoloNAS.html If you want to benchmark a PyTorch model, here are some guidelines: https://deci.ai/blog/measure-inference-time-deep-neural-networks/ Your code measures a single prediction without a warmup or anything, which means you are measuring mostly noise and overhead. Hope that helps
Hi @Phyrokar,
To add some extra context as of why this happened, we introduced fuse_model
option, which by default is set to True
.
This makes inference faster by fusing some of the models layers, but this operation requires some time to be set up. This is done during the first call of model.predict
, and all the following calls will benefit from the speedup I mentioned.
You can deactivate it by calling model.predict(image, fuse_model=False)
but in most cases, you will want to use it since the slowdown of the first call is quickly compensated by the speedup of the following calls.
For proper benchmarking, please refer to what @ofrimasad mentioned. Hoping this helps
Thanks @ofrimasad and @Louis-Dupont for your quick help! Unfortunately, there is the bug https://github.com/Deci-AI/super-gradients/issues/1197 with quantization. I found a workaround, but unfortunately the models don't seem to have the same accuracy. The same goes for Onnx.
Of course, the code needs optimization. But I wonder why the prediction from 3.1.1 to 3.1.3 is 10 times slower?! It would be great to keep the speed of 3.1.1 and fix the postprocessing bug.
Hey @ofrimasad ... I went through your guide and did the benchmarking for pytorch fp16 and fp32 model, I found that fp32 is faster than fp16 by 20 ms , any idea why that is ? Fp32: Fp16:
@Louis-Dupont , I tried your method of model.predict() as well but why does the time for each iteration vary ? Also there is a sudden increase in the time as well inbetween ! Any idea why that is happening ?
I'm using latest from head and getting strange results using fuse_model on predict. With fuse_model=True it runs slower on multiple (batch of 1) predictions than fuse_model=False. Does fuse happens every time you call predict (even if it was fused before)?
💡 Your Question
I trained a model with super-gradients 3.1.1 and used an inference time of around 10ms for detection. However, since the bug https://github.com/Deci-AI/super-gradients/issues/958 appears there, I decided to switch to super-gradients 3.1.3. Bug https://github.com/Deci-AI/super-gradients/issues/958 has now been fixed there, but the inference takes about 100ms. Can someone help me with the problem?
Here my relevant code:
Output with 3.1.3:
Versions
super-gradients 3.1.3 and 3.1.1