Closed niconielsen32 closed 1 year ago
Hey @niconielsen32 , first thanks for the support on your Youtube video (:
Generally, benchmarking prediction on the PyTorch model is a very bad practice… The model is changed very much when converted to ONNX and then compiled to TRT. For example, the RepVGG blocks and BN are folded. this has a huge affect on the inference time.
We implemented predict mainly for visual demonstration of the model capabilities rather then for benchmarking. Nevertheless, there is still work to be done to make predict() run faster, and we will update it soon.
If you want to experience YoloNas where it really shines- I suggest following our QAT/PTQ tutorial (we will soon also add a notebook for it) and observe its performance on T4. Let me know if you have any other questions.
Yeah im not looking at benchmarking any models or is referring to that. But people are using these functions and the pt models for initial testing and to see how the models work. I will def use tensorrt for optimization and utilize the quantization of this model but people looking for these models for smaller projects will most likely not go down that path when you can use other models directly with good performance.
The issue request was mainly because i cant see how its possible to only run 20 fps on a 4090 even with the PT models. So just to make sure the functionality around the model and function is not causing a huge FPS drop. Even doe its just a function, its still the main function for testing out the model when just playing around.
Hi @niconielsen32 ,
To add some more information, it seems that YoloV8
fuses its Conv2d
and BatchNorm2d
by default, while YoloNAS
requires it to be done explicitly.
You can fuse some of the blocks yourself by calling model.prep_model_for_conversion(input_size=(640, 640))
, which gives a performance boost. (640, 640) because it's the model input size.
Also note that in our benchmark we fuse more blocks, but we need to update our API to allow users to do it themselves. This would give an extra boost when running predict on the torch model. You can find implementation details here if you are interested.
Eventually, we will do all of this automatically when calling model.predict(...)
.
I think @shaydeci already covered everything else I had in mind. Feel free if you have more questions
fixed in this PR https://github.com/Deci-AI/super-gradients/pull/998
One more thing that I think we overlooked here. When you are predicting on a webcam stream, you are bounded by the webcam FPS. The Macbook Pro M1 for example, limits you to around 25 FPS. most other laptops give around 20 FPS. The most high-end webcams I could find on Amazon limit you to 30-60 FPS. so I am not sure how other Yolos presented 120 FPS.
Describe the bug
Have tried out all the models and can't get over 20 fps with the predict_webcam function on a RTX4090 GPU. For comparison the yolov8 models run at 100-120 fps.
Video
https://www.youtube.com/watch?v=_ON9oiT_G0w&t=7s