Open Abdul-Mukit opened 1 month ago
I have encountered this case, and I try to solve it by matching the filename and img_path. You can try it with this code in solver_utils.py:
from pathlib import Path
def calculate_ap(coco_gt: COCO, pd_path):
for idx, pd in enumerate(pd_path):
for k,v in coco_gt.imgs.items():
if Path(v["file_name"]).stem == pd["image_id"]:
pd_path[idx]["image_id"] = coco_gt.anns[k]["image_id"]
Did you train the model successfully? I trained the model and I can't get the AP that is more than 0. Do you know how to solve it? If you have any question, welcome to talk to me.
@yjmm10 thank you. I appreciate your input. I want to share my thoughts about the code snipped you shared.
The solution will work but will be extremely expensive. It has a time complexity of O(mn). Imagine, you have m
number of bounding box detections in total. This is basically the length of pd_path
in your code. And then you have n
number of images, which is the length of coco_gt.imgs
in your code. For every annotation, you are essentially going through every image and doing a string comparison. If you had 1,000 images and each images had on average 10 detections, then you are doing 1,000 1,000 * 10 = 10,000,000 string comparisons just to fix a mistake some other function did before the call to calculate_ap
.
I think it would benefit us all if the design of this repo can be made much better. The dataloader should also return the actual image_id to being with and predicts_to_json
should be changed to accept a list of int as the first argument. The first argument should be image_id_list
not img_paths
. It may look something like this:
def predicts_to_json(image_id_list:list[int], predicts, rev_tensor):
"""
TODO: function document
turn a batch of imagepath and predicts(n x 6 for each image) to a List of diction(Detection output)
"""
batch_json = []
for image_id, bboxes, box_reverse in zip(image_id_list, predicts, rev_tensor):
scale, shift = box_reverse.split([1, 4])
bboxes[:, 1:5] = (bboxes[:, 1:5] - shift[None]) / scale[None]
bboxes[:, 1:5] = transform_bbox(bboxes[:, 1:5], "xyxy -> xywh")
for cls, *pos, conf in bboxes:
bbox = {
"image_id": image_id,
"category_id": IDX_TO_ID[int(cls)],
"bbox": [float(p) for p in pos],
"score": float(conf),
}
batch_json.append(bbox)
return batch_json
@Abdul-Mukit You are right. I tried it in my small dataset, but when using it in a big dataset, the efficiency is slow.
I will try it and pr 。
Do you have other questions during training? If you complete it, would you talk about how to do it.
@yjmm10 no I have not trained using this model yet. I got stuck due to this bug.
I am actually a bit hesitant with the design choices in the repository. For example, the function calculate_ap(coco_gt: COCO, pd_path)
requires a coco object. On the contrary look at this mean average precision function from torchmetrics: https://lightning.ai/docs/torchmetrics/stable/detection/mean_average_precision.html
I think the design choices of torchmetrics is much better.
Easier to understand and debug for devs.
I am actually a bit hesitant with the design choices in the repository. For example, the function
calculate_ap(coco_gt: COCO, pd_path)
requires a coco object. On the contrary look at this mean average precision function from torchmetrics: https://lightning.ai/docs/torchmetrics/stable/detection/mean_average_precision.html I think the design choices of torchmetrics is much better. Easier to understand and debug for devs.
I think that the function of calculate_ap is not so good.
torchmetrics
for a more convenient and streamlined process.@yjmm10 I am working on a fix. I saw your MR where you edited load_data
#72 . I'll ping you again on that MR if needed. After my investigations, I think my solution will come somewhere from there so it would be good to stay in sync with you.
@yjmm10 I am working on a fix. I saw your MR where you edited
load_data
#72 . I'll ping you again on that MR if needed. After my investigations, I think my solution will come somewhere from there so it would be good to stay in sync with you.
Ok, Now I complete the simple validate for the val dataset. The image_id
will be processed in other MR, welcome to talk to me, or contribute your code to YOLO
Found the problem I think
def create_image_metadata(labels_path: str) -> Tuple[Dict[int, List], Dict[str, Dict]]:
"""
Create a dictionary containing image information and annotations indexed by image ID.
Args:
labels_path (str): The path to the annotation json file.
Returns:
- annotations_index: A dictionary where keys are image IDs and values are lists of annotations.
- image_info_dict: A dictionary where keys are image file names without extension and values are image information dictionaries.
"""
with open(labels_path, "r") as file:
labels_data = json.load(file)
id_to_idx = discretize_categories(labels_data.get("categories", [])) if "categories" in labels_data else None
annotations_index = organize_annotations_by_image(labels_data, id_to_idx) # check lookup is a good name?
image_info_dict = {Path(img["file_name"]).stem: img for img in labels_data["images"]}
return annotations_index, image_info_dict
annotations_index is using the image_id
stored in .json files as keys.
image_info_dict is using image_id
calculated from file name.
Both should have used image_id
calculated form file name only. Will be making the MR shortly. Thanks.
the variable of image_info_dict have the key of image_id
。
If we use image_id
, when we open image ,it's also troublesome to obtain the original path of th image
the variable of image_info_dict have the key of
image_id
。 If we useimage_id
, when we open image ,it's also troublesome to obtain the original path of th image
We already have available to us the dataset_path
, phase_name
variable. "images" folder name string is common in both coco and yolo datasets. That means we can reconstruct the original path of the image easily.
image_path = dataset_path / "Images" / phase_name / image_id
+ '.jpg'
The only thing concerning is that .jpg. According to proposal of using image_id
as our key, we will be getting rid of that. Image can also be .png format. So I guess a better key would be the image's file names with extension but without any path info.
First of all, thank you very much for this much-needed initiative.
Describe the bug
It is assumed in this repo that the image file names will always be int convertible. That is not true. If image names in the validation split are not convertible to
int
, validation fails. e.g. instead oftests/data/images/val/000000151480.jpg
if the name wastests/data/images/val/000000151480_.jpg
andinstances_val.json
was updated accordingly, validation will fail.This is the initial error message:
I am open to making PRs. Will appreciate input on how to solve this.
To Reproduce
Steps to reproduce the behavior:
tests/data/images/val/000000151480.jpg
totests/data/images/val/000000151480_.jpg
.tests/data/annotations/instances_val.json
to change"file_name": "000000151480.jpg",
to"file_name": "000000151480_.jpg"
.python yolo/lazy.py task=train model=v9-s dataset=mock task.epoch=1
.Expected behavior
In COCO datasets'
*/annotations/*.json
files, theimages
are supposed to haveint
id
s andstring
file_name
s. A string can have any character and the logic of this codebase should accommodate for it. Instead of using filename as a id it should use the actualid
of the image as mentioned in the.json
file.Screenshots
The problem is the it was assumed in
"image_id": int(Path(img_path).stem),
that thestem
will always be convertible toint
.Even if the int is removed to make it
"image_id": Path(img_path).stem
. The solution is fundamentally wrong asimage_id
is notfilename
. In theinstances_val.json
that particular image already had an int id assigned to it. Both id (int
) and filename (string
) can be completely random and may not have any similarityInstead of taking the correct id, the file name was used as the id. This was done as the
ModelValidator.solve()
's dataloader doesn't return the actual ids but only theimg_paths
.Thus we run into another assert error:
System Info (please complete the following ## information):
Additional context
My dataset was produced directly from CVAT with_.jpg
int
imageid and random string for filenames. I have used these datasets before with other DNN libraries, without having this issue before. I am confident that I am not breaking any of COCO's dataset conventions. This "filename is int and same as id", assumption is very limiting. In many industry applications, images are time stamped like `. Moreover, while editing datasets using libraries like [datumaro](https://github.com/openvinotoolkit/datumaro) images
id`s mentioned in .json can certainly change anytime.