Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.53k stars 490 forks source link

Yolonas can't detect a video #1347

Closed RaffelRavionaldo closed 7 months ago

RaffelRavionaldo commented 1 year ago

💡 Your Question

I want to use yolonas small version for object detection in my project, I already train the model with 20 epochs, and when I try to predict an image with this code :

from super_gradients.training import models

dataset_params = {
    'classes': ['fire', 'smoke', 'others']
}

best_model = models.get('yolo_nas_s',
                        num_classes=len(dataset_params['classes']),
                        checkpoint_path="train/ckpt_best.pth")

test_image = 'test_img.jpg'
best_model.predict(test_image, conf=0.35).show()

the output shows the image with Yolonas prediction (they have a bounding box), the image was the screenshot of my video that i want to try, but if I try to use the model to detect a video with the same code to load a model and add the code like this :

input_video_path = "testing.mp4"
output_video_path = "result_1.mp4"

best_model.to("cuda").predict(input_video_path, conf=0.3).save(output_video_path)

The algorithm doesn't do the prediction, and the output is the same as the input video (It does not contain any bounding box),

I use Python 3.8.17, Nvidia GTX 1650 with CUDA 11.7 and cuDNN 8.9.3, super-gradients 3.1.3 and pytorch 1.13.1 (I use this syntax for PyTorch installation: conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia)

Versions

No response

RaffelRavionaldo commented 1 year ago

Actually, I just realized the model do the prediction but only for the last frame of the video, you can see the video via this link: https://drive.google.com/file/d/1dunFF7_DiTogGn6_8Cs7f33fEtnv4aTI/view?usp=sharing

Louis-Dupont commented 1 year ago

Hello @RaffelRavionaldo

I've experimented with the pre-trained weights on Coco (i.e. without fine-tuning), and my findings somewhat align with yours. However, I've noticed additional detections in prior time frames.

First, it's important to note that video sequence does not influence the individual frame predictions; they are calculated independently. The last frame managed to detect the fire, but this is by "chance". The main reason for the lack of predictions is probably a low prediction confidence.

Why might this be happening?

The root of the issue isn't clear. It's possible the model hasn't effectively learned to recognize a fire.

The dataset might be more representative of larger fires, making it challenging for the model to identify smaller fires or flames. Although both are fires, their shapes, contexts, and visual characteristics might differ significantly. For instance, a big fire might produce more smoke, larger flames, and radiate intense light, while a small fire or a candle might be subtler with smaller, more defined flames and less surrounding visual noise.

Ideas to explore

  1. Adjust the Confidence Threshold: A straightforward solution could be to lower the confidence threshold. You can try using predictions = model.predict("my_video.mp4", conf=0.01).
  2. Review Your Dataset: Examine the types of fire images in your dataset. If the images are vastly different from what you're testing (e.g., a small candle flame), consider expanding your dataset with more diverse fire images or employing image augmentations to improve the model's generalization capabilities.
  3. Test on Dataset Images: You can directly run the predict function on images from your dataset.

I hope this helps and makes things clearer!

RaffelRavionaldo commented 1 year ago

Hello @Louis-Dupont

Thanks for the advice, I already tried your suggestion and retrained the model with my video data (making it into the image, labelling it and retraining the model from scratch), but actually, the output is still the same (just predict the last frame) and a lot of noise (because I set the conf to 0.01). I already updated my pytorch, cuda to 11.8 and supergradients version on my PC

but when I tried to run the above code in the google colab, the output was good (they predict every frame). I don't know why it can happen.