WongKinYiu / YOLO

An MIT rewrite of YOLOv9
MIT License
501 stars 53 forks source link

Validation fails when image names are not int convertible. #67

Open Abdul-Mukit opened 1 month ago

Abdul-Mukit commented 1 month ago

First of all, thank you very much for this much-needed initiative.

Describe the bug

It is assumed in this repo that the image file names will always be int convertible. That is not true. If image names in the validation split are not convertible to int, validation fails. e.g. instead of tests/data/images/val/000000151480.jpg if the name was tests/data/images/val/000000151480_.jpg and instances_val.json was updated accordingly, validation will fail.

This is the initial error message:

Error executing job with overrides: ['task=train', 'model=v9-s', 'dataset=mock', 'task.epoch=1']
Traceback (most recent call last):
  File "/home/abdul/projects/YOLO/yolo/lazy.py", line 39, in main
    solver.solve(dataloader)
  File "/home/abdul/projects/YOLO/yolo/tools/solver.py", line 148, in solve
    mAPs = self.validator.solve(self.validation_dataloader, epoch_idx=epoch_idx)
  File "/home/abdul/projects/YOLO/yolo/tools/solver.py", line 252, in solve
    predict_json.extend(predicts_to_json(img_paths, predicts, rev_tensor))
  File "/home/abdul/projects/YOLO/yolo/utils/model_utils.py", line 154, in predicts_to_json
    "image_id": int(Path(img_path).stem),
ValueError: invalid literal for int() with base 10: '000000151480_'

I am open to making PRs. Will appreciate input on how to solve this.

To Reproduce

Steps to reproduce the behavior:

  1. Rename tests/data/images/val/000000151480.jpg to tests/data/images/val/000000151480_.jpg.
  2. Edit tests/data/annotations/instances_val.json to change "file_name": "000000151480.jpg", to "file_name": "000000151480_.jpg".
  3. Try training using python yolo/lazy.py task=train model=v9-s dataset=mock task.epoch=1.
  4. See error

Expected behavior

In COCO datasets' */annotations/*.json files, the images are supposed to have int ids and string file_names. A string can have any character and the logic of this codebase should accommodate for it. Instead of using filename as a id it should use the actual id of the image as mentioned in the .json file.

Screenshots

The problem is the it was assumed in "image_id": int(Path(img_path).stem), that the stem will always be convertible to int.

Even if the int is removed to make it "image_id": Path(img_path).stem. The solution is fundamentally wrong as image_id is not filename. In the instances_val.json that particular image already had an int id assigned to it. Both id (int) and filename (string) can be completely random and may not have any similarity image

Instead of taking the correct id, the file name was used as the id. This was done as the ModelValidator.solve()'s dataloader doesn't return the actual ids but only the img_paths. image

Thus we run into another assert error:

Error executing job with overrides: ['task=train', 'model=v9-s', 'dataset=mock', 'task.epoch=1']
Traceback (most recent call last):
  File "/home/abdul/projects/YOLO/yolo/lazy.py", line 39, in main
    solver.solve(dataloader)
  File "/home/abdul/projects/YOLO/yolo/tools/solver.py", line 148, in solve
    mAPs = self.validator.solve(self.validation_dataloader, epoch_idx=epoch_idx)
  File "/home/abdul/projects/YOLO/yolo/tools/solver.py", line 260, in solve
    result = calculate_ap(self.coco_gt, predict_json)
  File "/home/abdul/projects/YOLO/yolo/utils/solver_utils.py", line 12, in calculate_ap
    coco_dt = coco_gt.loadRes(pd_path)
  File "/home/abdul/projects/YOLO/.venv/lib/python3.10/site-packages/pycocotools/coco.py", line 327, in loadRes
    assert set(annsImgIds) == (set(annsImgIds) & set(self.getImgIds())), \
AssertionError: Results do not correspond to current coco set

System Info (please complete the following ## information):

Additional context

My dataset was produced directly from CVAT with int imageid and random string for filenames. I have used these datasets before with other DNN libraries, without having this issue before. I am confident that I am not breaking any of COCO's dataset conventions. This "filename is int and same as id", assumption is very limiting. In many industry applications, images are time stamped like `_.jpg. Moreover, while editing datasets using libraries like [datumaro](https://github.com/openvinotoolkit/datumaro) imagesid`s mentioned in .json can certainly change anytime.

yjmm10 commented 1 month ago

I have encountered this case, and I try to solve it by matching the filename and img_path. You can try it with this code in solver_utils.py:

from pathlib import Path

def calculate_ap(coco_gt: COCO, pd_path):
    for idx, pd in enumerate(pd_path):
        for k,v in coco_gt.imgs.items():
            if Path(v["file_name"]).stem == pd["image_id"]:
                pd_path[idx]["image_id"] = coco_gt.anns[k]["image_id"]

Did you train the model successfully? I trained the model and I can't get the AP that is more than 0. Do you know how to solve it? If you have any question, welcome to talk to me.

Abdul-Mukit commented 1 month ago

@yjmm10 thank you. I appreciate your input. I want to share my thoughts about the code snipped you shared.

The solution will work but will be extremely expensive. It has a time complexity of O(mn). Imagine, you have m number of bounding box detections in total. This is basically the length of pd_path in your code. And then you have n number of images, which is the length of coco_gt.imgs in your code. For every annotation, you are essentially going through every image and doing a string comparison. If you had 1,000 images and each images had on average 10 detections, then you are doing 1,000 1,000 * 10 = 10,000,000 string comparisons just to fix a mistake some other function did before the call to calculate_ap.

I think it would benefit us all if the design of this repo can be made much better. The dataloader should also return the actual image_id to being with and predicts_to_json should be changed to accept a list of int as the first argument. The first argument should be image_id_list not img_paths. It may look something like this:

def predicts_to_json(image_id_list:list[int], predicts, rev_tensor):
    """
    TODO: function document
    turn a batch of imagepath and predicts(n x 6 for each image) to a List of diction(Detection output)
    """
    batch_json = []
    for image_id, bboxes, box_reverse in zip(image_id_list, predicts, rev_tensor):
        scale, shift = box_reverse.split([1, 4])
        bboxes[:, 1:5] = (bboxes[:, 1:5] - shift[None]) / scale[None]
        bboxes[:, 1:5] = transform_bbox(bboxes[:, 1:5], "xyxy -> xywh")
        for cls, *pos, conf in bboxes:
            bbox = {
                "image_id": image_id,
                "category_id": IDX_TO_ID[int(cls)],
                "bbox": [float(p) for p in pos],
                "score": float(conf),
            }
            batch_json.append(bbox)
    return batch_json
yjmm10 commented 1 month ago

@Abdul-Mukit You are right. I tried it in my small dataset, but when using it in a big dataset, the efficiency is slow.

I will try it and pr 。

Do you have other questions during training? If you complete it, would you talk about how to do it.

Abdul-Mukit commented 1 month ago

@yjmm10 no I have not trained using this model yet. I got stuck due to this bug.

Abdul-Mukit commented 1 month ago

I am actually a bit hesitant with the design choices in the repository. For example, the function calculate_ap(coco_gt: COCO, pd_path) requires a coco object. On the contrary look at this mean average precision function from torchmetrics: https://lightning.ai/docs/torchmetrics/stable/detection/mean_average_precision.html I think the design choices of torchmetrics is much better. Easier to understand and debug for devs.

yjmm10 commented 1 month ago

I am actually a bit hesitant with the design choices in the repository. For example, the function calculate_ap(coco_gt: COCO, pd_path) requires a coco object. On the contrary look at this mean average precision function from torchmetrics: https://lightning.ai/docs/torchmetrics/stable/detection/mean_average_precision.html I think the design choices of torchmetrics is much better. Easier to understand and debug for devs.

I think that the function of calculate_ap is not so good.

  1. The repository currently favors COCO and YOLO formats. However, when using YOLO, the AP calculation requires a JSON file, which is not an ideal approach.
  2. The dataset format should be standardized and loaded through a dataloader. Additionally, the AP calculation method could utilize torchmetrics for a more convenient and streamlined process.
Abdul-Mukit commented 1 month ago

@yjmm10 I am working on a fix. I saw your MR where you edited load_data #72 . I'll ping you again on that MR if needed. After my investigations, I think my solution will come somewhere from there so it would be good to stay in sync with you.

yjmm10 commented 1 month ago

@yjmm10 I am working on a fix. I saw your MR where you edited load_data #72 . I'll ping you again on that MR if needed. After my investigations, I think my solution will come somewhere from there so it would be good to stay in sync with you.

Ok, Now I complete the simple validate for the val dataset. The image_id will be processed in other MR, welcome to talk to me, or contribute your code to YOLO

Abdul-Mukit commented 1 month ago

Found the problem I think

def create_image_metadata(labels_path: str) -> Tuple[Dict[int, List], Dict[str, Dict]]:
    """
    Create a dictionary containing image information and annotations indexed by image ID.

    Args:
        labels_path (str): The path to the annotation json file.

    Returns:
        - annotations_index: A dictionary where keys are image IDs and values are lists of annotations.
        - image_info_dict: A dictionary where keys are image file names without extension and values are image information dictionaries.
    """
    with open(labels_path, "r") as file:
        labels_data = json.load(file)
        id_to_idx = discretize_categories(labels_data.get("categories", [])) if "categories" in labels_data else None
        annotations_index = organize_annotations_by_image(labels_data, id_to_idx)  # check lookup is a good name?
        image_info_dict = {Path(img["file_name"]).stem: img for img in labels_data["images"]}
        return annotations_index, image_info_dict

annotations_index is using the image_id stored in .json files as keys. image_info_dict is using image_id calculated from file name. Both should have used image_id calculated form file name only. Will be making the MR shortly. Thanks.

yjmm10 commented 1 month ago

image the variable of image_info_dict have the key of image_id。 If we use image_id, when we open image ,it's also troublesome to obtain the original path of th image

Abdul-Mukit commented 1 month ago

the variable of image_info_dict have the key of image_id。 If we use image_id, when we open image ,it's also troublesome to obtain the original path of th image

We already have available to us the dataset_path, phase_name variable. "images" folder name string is common in both coco and yolo datasets. That means we can reconstruct the original path of the image easily. image_path = dataset_path / "Images" / phase_name / image_id + '.jpg'

The only thing concerning is that .jpg. According to proposal of using image_id as our key, we will be getting rid of that. Image can also be .png format. So I guess a better key would be the image's file names with extension but without any path info.