Inference speed: Huge differences between 3.1.1 and 3.1.3

Phyrokar commented 1 year ago

💡 Your Question

I trained a model with super-gradients 3.1.1 and used an inference time of around 10ms for detection. However, since the bug https://github.com/Deci-AI/super-gradients/issues/958 appears there, I decided to switch to super-gradients 3.1.3. Bug https://github.com/Deci-AI/super-gradients/issues/958 has now been fixed there, but the inference takes about 100ms. Can someone help me with the problem?

Here my relevant code:

from super_gradients.common.object_names import Models
from super_gradients.training import models

net = models.get(Models.YOLO_NAS_M, num_classes=3, checkpoint_path="/home/grip/catkin_ws/src/cv_sensor_weights/weights/yolo_nas/ckpt_best_m.pth").cuda()

elapsed = 0
elapsed = 0
for i in range(50):
    t= time.time()
    prediction = net.predict(img)
    elapsed += time.time() - t
    print(elapsed)
    t2= time.time()
    for image_prediction in prediction:
        class_names = image_prediction.class_names
        labels = image_prediction.prediction.labels
        confidence = image_prediction.prediction.confidence
        bboxes = image_prediction.prediction.bboxes_xyxy
    elapsed2 += time.time() - t2

Output with 3.1.3:

Elapsed  total= 6.435885906219482
Elapsed 2 total= 0.005202054977416992

Versions

super-gradients 3.1.3 and 3.1.1

ofrimasad commented 1 year ago

Hi @Phyrokar. Please refer to this guide: https://docs.deci.ai/super-gradients/documentation/source/BenchmarkingYoloNAS.html If you want to benchmark a PyTorch model, here are some guidelines: https://deci.ai/blog/measure-inference-time-deep-neural-networks/ Your code measures a single prediction without a warmup or anything, which means you are measuring mostly noise and overhead. Hope that helps

Louis-Dupont commented 1 year ago

Hi @Phyrokar, To add some extra context as of why this happened, we introduced fuse_model option, which by default is set to True.

This makes inference faster by fusing some of the models layers, but this operation requires some time to be set up. This is done during the first call of model.predict, and all the following calls will benefit from the speedup I mentioned. You can deactivate it by calling model.predict(image, fuse_model=False) but in most cases, you will want to use it since the slowdown of the first call is quickly compensated by the speedup of the following calls.

For proper benchmarking, please refer to what @ofrimasad mentioned. Hoping this helps

Phyrokar commented 1 year ago

Thanks @ofrimasad and @Louis-Dupont for your quick help! Unfortunately, there is the bug https://github.com/Deci-AI/super-gradients/issues/1197 with quantization. I found a workaround, but unfortunately the models don't seem to have the same accuracy. The same goes for Onnx.

Of course, the code needs optimization. But I wonder why the prediction from 3.1.1 to 3.1.3 is 10 times slower?! It would be great to keep the speed of 3.1.1 and fix the postprocessing bug.

PrajwalCogniac commented 1 year ago

Hey @ofrimasad ... I went through your guide and did the benchmarking for pytorch fp16 and fp32 model, I found that fp32 is faster than fp16 by 20 ms , any idea why that is ? Fp32: Fp16:

PrajwalCogniac commented 1 year ago

@Louis-Dupont , I tried your method of model.predict() as well but why does the time for each iteration vary ? Also there is a sudden increase in the time as well inbetween ! Any idea why that is happening ?

RolandasRazma commented 1 year ago

I'm using latest from head and getting strange results using fuse_model on predict. With fuse_model=True it runs slower on multiple (batch of 1) predictions than fuse_model=False. Does fuse happens every time you call predict (even if it was fused before)?

Deci-AI / super-gradients

Inference speed: Huge differences between 3.1.1 and 3.1.3 #1263

💡 Your Question

Versions