THU-MIG / yolov10

YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
https://arxiv.org/abs/2405.14458
GNU Affero General Public License v3.0
9.85k stars 968 forks source link

Detection confidence randomly goes above 1 for certain frames of video (ultralytics yolov10n) #317

Open Bardia-Zamani-Abnili-UniMelb opened 4 months ago

Bardia-Zamani-Abnili-UniMelb commented 4 months ago

I would say around 5-10% of frames cause this. I'm still working on detecting and saving those frames to see if it happens every time. I have screenshots of confidences higher than 40 (40 out of 1, as in 4000%). The issue is per frame, as in either all detections in a frame are like this, or they are all normal.

I'm sure it's not caused by any manipulation that I do after the model runs because when this happens the number of detected objects increases, meaning fewer detections are being discarded for being under the confidence threshold, meaning the confidence is too high inside of the model before the results are passed onto my code.

Relevant parts of my code:

model = ultralytics.YOLO("yolov10n.pt", task="detect")
# and then inside a loop:
    results = model.track(frame, stream=True, persist=True, verbose=False)

I will also try it with verbose=False removed to see if the console output reflects this issue.

Could there be something wrong with the sigmoid approximation function? Or something about the NMS being removed? I don't have enough of a background in neural networks to guess.

Bardia-Zamani-Abnili-UniMelb commented 4 months ago

Just noticed that in all of the instances of the bug that I have screenshotted, the boxes do not have IDs.

I looked at the part of my code handling missing box IDs and found the source of the bogus confidence values is (as sources of issues often are) in my own code.

I assume missing IDs mean that the inter-frame relationship (a la persist=True) is broken and the model needs to start from scratch?

However I am not closing this issue because I would like to know how come the model seems to detect a lot more objects just after this happens even though the input has not changed significantly.

Perhaps the confidence threshold is lowered or ignored for a few frames for some reason?

Is there a way I can continue to detect all of those extra objects consistently?