calculate_ap fails when nothing is predicted. Can't train from scratch.

Abdul-Mukit commented 2 weeks ago

Describe the bug

Started training using the command: python yolo/lazy.py task=train model=v9-s dataset=mock task.epoch=1 cpu_num=0 device=cpu weight=False Note weight=False. I was trying to do a sanity check by training the model from scratch on a few images.

Ran into the following error:

Traceback (most recent call last):
  File "/home/abdul/projects/YOLO/yolo/lazy.py", line 39, in main
    solver.solve(dataloader)
  File "/home/abdul/projects/YOLO/yolo/tools/solver.py", line 149, in solve
    mAPs = self.validator.solve(self.validation_dataloader, epoch_idx=epoch_idx)
  File "/home/abdul/projects/YOLO/yolo/tools/solver.py", line 264, in solve
    result = calculate_ap(self.coco_gt, predict_json)
  File "/home/abdul/projects/YOLO/yolo/utils/solver_utils.py", line 12, in calculate_ap
    coco_dt = coco_gt.loadRes(pd_path)
  File "/home/abdul/projects/YOLO/.venv/lib/python3.10/site-packages/pycocotools/coco.py", line 329, in loadRes
    if 'caption' in anns[0]:
IndexError: list index out of range

Reason: As the model is being trained from scratch, there were not detections by the model. predictions were empty. As a result, calculate_ap throws an exception.

Proposed solution: Please consider adopting MeanAveragePrecision from pytorch lightning. Or consider implementing something like that. https://lightning.ai/docs/torchmetrics/stable/detection/mean_average_precision.html The current implementation of calculate_ap is quite buggy. See this MR #79. The bug, stems from the WIP state of calculate_ap. If calculate_ap was like MeanAveragePrecision, then the training dataloader would never need to return the image_id/image_name. Thus MR #79 would have been necessary too.

Expected behavior

Behavior could be like implementation in PyTorch-Lightning. https://lightning.ai/docs/torchmetrics/stable/detection/mean_average_precision.html

bherbruck commented 2 weeks ago

That scam bot... Can somebody report those posts?

Abdul-Mukit commented 2 weeks ago

@henrytsui000 please take a look at the comments made by spambot accounts. I think these are harmful. Someone might fall into the traps. I tried reporting one of the accounts. Can others do too, please?

Abdul-Mukit commented 2 weeks ago

@henrytsui000 what is your opinion about torchmetrics. It has Apache License like OpenCV. Do you think a PR replacing calculate_ap with a mAP class from torchmetrics will be helpful?

henrytsui000 commented 2 weeks ago

@henrytsui000 what is your opinion about torchmetrics. It has Apache License like OpenCV. Do you think a PR replacing calculate_ap with a mAP class from torchmetrics will be helpful?

Calling torchmetrics is acceptable because it doesn't involve any copy-pasting or modifications—it's limited to installing and calling the API. If there's a better way to accumulate mAP during inference (per batch) and then obtain the total mAP after inference without rerunning the process, that would be greatly appreciated!

Best regards, Henry Tsui

Abdul-Mukit commented 2 weeks ago

@henrytsui000 I am bit confused.

If there's a better way to accumulate mAP during inference (per batch) and then obtain the total mAP after inference without rerunning the process, that would be greatly appreciated!

Do you prefer having a class like torchmetric's MeanAveragePrecision but not have to install the whole library? Or a PR that directly uses torchmetrics?

For now, I'll try sending a PR that will use torchmetrics to calculate mAP. I'll try testing it too to see how reliable it is. Let me know whether you would just prefer a standalone class MeanAveragePrecision instead. I can try rewriting it and send a PR.
Thank you.

WongKinYiu / YOLO

calculate_ap fails when nothing is predicted. Can't train from scratch. #89

Describe the bug

Expected behavior