THU-MIG / yolov10

YOLOv10: Real-Time End-to-End Object Detection
https://arxiv.org/abs/2405.14458
GNU Affero General Public License v3.0
9.18k stars 850 forks source link

How to save correct predictions of the val set in txt? #78

Closed LorenzoSun-V closed 3 months ago

LorenzoSun-V commented 3 months ago

When I was doing val set inference, I saved the prediction results in txt, and found that all pictures had 300 detection results: image image I used this command to do val set inference:

yolo val \
    model=${root}/${project}/${name}/weights/best.pt \
    data=${data} \
    imgsz=${imgsz} \
    batch=${batch}  \
    device=${device} \
    project=${project} \
    name=${name}/val \
    save_txt=true \
    save_conf=true \
    exist_ok=true \

What should I do if I want to save the correct predictions of the val set in txt?

jameslahm commented 3 months ago

Thanks for your interest! Could you please provide the details of what correct predictions mean?

LorenzoSun-V commented 3 months ago

In yolov8, using save_txt flag will only save the final results of the prediction(results after nms). However, the results after v10postprocess still have 300 predictions. image

LorenzoSun-V commented 3 months ago

In addition, you said we adopt the top one selection in your paper: image

But the code seems to use the 300 predictions to calculate metrics with labels. image

LorenzoSun-V commented 3 months ago

How to understand this dual label assignments? It would be great if you could explain it in detail because I'm a little bit confused. :star_struck:

jameslahm commented 3 months ago

Since there is no NMS in the postprocessing, we can directly output the max_det predictions.

In yolov8, using save_txt flag will only save the final results of the prediction(results after nms). However, the results after v10postprocess still have 300 predictions. image

Yes, during training, we adopt the top one selection to obtain one-to-one matching.

In addition, you said we adopt the top one selection in your paper:

Thanks for your interest! In dual label assignments, we use one-to-one matching and one-to-many matching for the prediction at the same time. So, during training, there are two heads, namely one-to-many head and one-to-one head for optimization. During inference, we can leverage the one-to-one head to output predictions without NMS postprocessing.

How to understand this dual label assignments? It would be great if you could explain it in detail because I'm a little bit confused. 🤩

LorenzoSun-V commented 3 months ago

max_det is equal to 300 which indicates there are 300 predictions after v10postprocess. And these 300 predictions are used to calculate mAP and other metrics. I suppose only the final predictions will be used in metrics calculation.

Since there is no NMS in the postprocessing, we can directly output the max_det predictions.

LorenzoSun-V commented 3 months ago

I'm just wondering why the same post-processing code of prediciton is not used in validation to calculate the metrics. In other words, since there is no NMS process, why not directly use the results after threshold screening to calculate metrics. image

LorenzoSun-V commented 3 months ago

In my view, the code in red box should be added in the validator. image

jameslahm commented 3 months ago

Thanks. Due to that the model does not need NMS, all output predictions can be directly added into the validator without the threshold.

LorenzoSun-V commented 3 months ago

In my case, the mAPs calculating by discarding the threshold are much lower than those calculating by adopting the threshold. Because a lot of useless predictions(predicitons with low scores, such as 1e-5) are employed to metrics calculation, and the neg samples in my validation set also have predictions.

In COCO val set, the mAPs ​​are almost the same. Discarding the threshold: image Adopting the threshold: image

I just wonder why don't use the same confidence threshold setting like other models(such as 0.001) when validating.

Due to that the model does not need NMS, all output predictions can be directly added into the validator without the threshold.

jameslahm commented 3 months ago

Thanks. Do you calculate mAP in the same way as COCO?

LorenzoSun-V commented 3 months ago

Yes, I do. I used the same command as COCO to calculate mAP in my case.

jameslahm commented 3 months ago

Thanks. Could you please provide more details of how you use the same command as COCO to calculate mAP? Do you save the predictions.json and call eval_json to calculate the mAP? Thank you!

LorenzoSun-V commented 3 months ago

After enabling the eval_json using the "save_json" flag, the results are almost the same. Thanks.

LorenzoSun-V commented 3 months ago

Sorry, I made a mistake.

The details are as following.

Case1. Adpoting the threshold

code:

class YOLOv10DetectionValidator(DetectionValidator):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.args.save_json |= self.is_coco

    def postprocess(self, preds):
        if isinstance(preds, dict):
            preds = preds["one2one"]

        if isinstance(preds, (list, tuple)):
            preds = preds[0]

        preds = preds.transpose(-1, -2)
        boxes, scores, labels = ops.v10postprocess(preds, self.args.max_det, self.nc)
        bboxes = ops.xywh2xyxy(boxes)
        # return torch.cat([bboxes, scores.unsqueeze(-1), labels.unsqueeze(-1)], dim=-1)
        preds = torch.cat([bboxes, scores.unsqueeze(-1), labels.unsqueeze(-1)], dim=-1)

        mask = preds[..., 4] > self.args.conf
        if self.args.classes is not None:
            mask = mask & (preds[..., 5:6] == torch.tensor(self.args.classes, device=preds.device).unsqueeze(0)).any(2)

        return [p[mask[idx]] for idx, p in enumerate(preds)]

command:

yolo val \
    model=${root}/${project}/${name}/weights/best.pt \
    data=${data} \
    imgsz=${imgsz} \
    batch=${batch}  \
    device=${device} \
    project=${project} \
    name=${name}/val_coco \
    conf=0.25 \
    save_json=true \
    plots=true

The results are:

Class     Images  Instances      Box(P          R      mAP50  mAP50-95)
     all       1161        990      0.915      0.915      0.951      0.712

Case2. Discarding the threshold

code:

class YOLOv10DetectionValidator(DetectionValidator):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.args.save_json |= self.is_coco

    def postprocess(self, preds):
        if isinstance(preds, dict):
            preds = preds["one2one"]

        if isinstance(preds, (list, tuple)):
            preds = preds[0]

        preds = preds.transpose(-1, -2)
        boxes, scores, labels = ops.v10postprocess(preds, self.args.max_det, self.nc)
        bboxes = ops.xywh2xyxy(boxes)
        return torch.cat([bboxes, scores.unsqueeze(-1), labels.unsqueeze(-1)], dim=-1)

command:

yolo val \
    model=${root}/${project}/${name}/weights/best.pt \
    data=${data} \
    imgsz=${imgsz} \
    batch=${batch}  \
    device=${device} \
    project=${project} \
    name=${name}/val_coco \
    save_json=true \
    plots=true

The results are:

Class     Images  Instances      Box(P          R      mAP50  mAP50-95)
       all       1161        990      0.915      0.915      0.961      0.679
jameslahm commented 3 months ago

Thanks. Could you please manually set self.is_coco = True for the validator to keep the same way as COCO for calculating mAP?

LorenzoSun-V commented 3 months ago

I set self.is_coco=True manually and the difference is still existed. The results are as the same as before.

LorenzoSun-V commented 3 months ago

In my case, the mAPs calculating by discarding the threshold are much lower than those calculating by adopting the threshold. Because a lot of useless predictions(predicitons with low scores, such as 1e-5) are employed to metrics calculation, and the neg samples in my validation set also have predictions.

In COCO val set, the mAPs ​​are almost the same. Discarding the threshold: image Adopting the threshold: image

I just wonder why don't use the same confidence threshold setting like other models(such as 0.001) when validating.

Due to that the model does not need NMS, all output predictions can be directly added into the validator without the threshold.

In this reply, I didn't use conf flag. conf is set as 0.001 as default. After I use conf=0.25 in COCO validation, the results are quite different: image

LorenzoSun-V commented 3 months ago

In my opinion, my task is much easier than COCO. So too much predictions are detrimental to the results. However, there are many hard and small samples in COCO so that using all output predictions to validation is better.

jameslahm commented 3 months ago

Thanks for the detailed explanation. Would you mind sharing your validation set and the checkpoint with us? We are trying to investigate this and confirm if there is an issue in the codebase. Thank you!

LorenzoSun-V commented 3 months ago

Of course! Would you mind giving your e-mail to me? Because the dataset is private.

jameslahm commented 3 months ago

Thank you! Our email is jameslahm17@gmail.com.

leonnil commented 3 months ago

Thanks for your assistance! We have successfully reproduced the issue. We believe the discrepancy can be attributed to the differences in mAP calculation methods between YOLO and COCO.

The COCO evaluation calculates mAP using the precision points where recall changes, while YOLO interpolates 101 points and performs integration by default. When evaluating with COCO, using the conf parameter may filter out true positive predictions, causing the precision to drop to 0, which negatively impacts the mAP on the COCO validation set. Conversely, YOLO uses interpolation to replace the precision of missing true positives, which might improve the mAP in some cases.

Therefore, we prefer not to set conf as a threshold to filter out predictions in order to maintain COCO mAP performance. You can try to use COCO to evaluate your own dataset to further verify this. We hope this explanation is helpful to you!

LorenzoSun-V commented 3 months ago

Yes, it does make sense. Thanks for your patient explanation.