Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.54k stars 496 forks source link

Dataset format #1768

Closed mariannaparzych closed 7 months ago

mariannaparzych commented 8 months ago

💡 Your Question

I am using COCO-like dataset https://github.com/DS4SD/DocLayNet. I use COCOFormatDetectionDataset from super_gradients to load the data. According to documentation and my observation COCOFormatDetectionDataset outputs annotations in format (x, y, x, y, class_id). I tried to use COCOFormatDetectionDataset, but training throws error:

Exception has occurred: RuntimeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Class values must be smaller than num_classes.

and I can see that pixel bbox values are use as gt_classes values.

Tutorial in notebooks/detection_how_to_connect_custom_dataset.ipynb suggests that Trainer needs annos in format (class_id, x_center, y_center, H, W). Is it actually true, that Trainer and Datasets are not compatibile?

Should Trainer get [x_center, y_center, H, W] as un pixels values or fractions related to image size in range (0,1)?

Versions

No response

shaydeci commented 8 months ago

@mariannaparzych Trainer does not "expect" a specific target forma. Other component like the loss and the model do.

It would be great if you could give additional information on what it is you are trying to achieve.

mariannaparzych commented 8 months ago

@shaydeci Thank you for the aswer. You're wright, I expressed my problem incorrectly.

I am trying to use trasfer learning for odject detection on COCO-like dataset. I started with notebooks/detection_how_to_connect_custom_dataset.ipynb:

from super_gradients.training import Trainer, models, training_hyperparams

trainer = Trainer(
        experiment_name=f"{experiment_name}_{timestmp:%Y-%m-%d_%H:%M}",
        ckpt_root_dir=ckpt_root_dir,
    )
 model = models.get(
        "yolox_l", pretrained_weights="coco", num_classes=train_dataset.num_classes
    )
train_params = training_hyperparams.get("coco2017_yolox")

I am just using default configuration from the tutorial. I assumed that to load COCO-like dataset I can use COCOFormatDetectionDataset class. But it throwed error:

Exception has occurred: RuntimeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Class values must be smaller than num_classes.
  File "/home/marianna.parzych/Unstructured/super-gradients/venv/lib/python3.10/site-packages/super_gradients/training/losses/yolox_loss.py", line 235, in _compute_loss
    gt_matched_classes, fg_mask, pred_ious_this_matching, matched_gt_inds, num_fg_img = self.get_assignments(
  File "/home/marianna.parzych/Unstructured/super-gradients/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/marianna.parzych/Unstructured/super-gradients/venv/lib/python3.10/site-packages/super_gradients/training/losses/yolox_loss.py", line 471, in get_assignments
    gt_cls_per_image = F.one_hot(gt_classes.to(torch.int64), self.num_classes)

It happens because the format of target from the dataset is wrong. I assumed, that the library has some standard formats, so you can choose dataset class that suits you and try it with different models and training parameters. If not, where can I find info in documentation about formats needed by different model implementations and loss functions?

lucidBrot commented 8 months ago

With the COCOFormatDetectionDataset I also had issues, and I also originally expected it to just work. I think this is a bug, but maybe it's a design decision(?). It seems to load the data in XYXY format..

As a workaround, converting my data helped. Storing the data in YOLO format and loading it with super_gradients.training.dataloaders.dataloaders.coco_detection_yolo_format_train seems to have worked for me. Although this also returns the loaded data as XYXY and probably even in pixel coordinates. This works fine for training.


I've also given the direct way a try, loading the COCO json data. This requires a bit more diving into the code:
(However, I have not yet managed to successfully train with this. The Ground Truth visualizations look correct but the iou loss starts at 1 and stays there.)

Alternatively, you can wrap your dataset to bring it into the right shape. You'll have to determine what exactly you'll need to do for your data, of course. Something like this:


class CocoDatasetWrapper(COCOFormatDetectionDataset):
    """
    Description from Base class at https://github.com/Deci-AI/super-gradients/blob/2e591fdde09e18be06a41eafcfe7d2e8362346e4/src/super_gradients/training/datasets/detection_datasets/coco_format_detection.py#L20 :
    Base dataset to load ANY dataset that is with a similar structure to the COCO dataset.
    - Annotation file (.json). It has to respect the exact same format as COCO, for both the json schema and the bbox format (xywh).
    - One folder with all the images.

    Output format: (x, y, x, y, class_id)
    """

    def __init__(self, fix_order=True,
                 image_width: int|None = None,
                 image_height: int|None = None,
                 *args, **kwargs):
        self.fix_order = fix_order
        self.image_width = image_width
        self.image_height = image_height
        super().__init__(*args, **kwargs)

    def _load_annotation(self, sample_id: int) -> dict:
    # The initial annotations.json file contains the format [x_min, y_min, width, height]
        annotation = super()._load_annotation(sample_id)
    # This annotation is in [x1, y1, x2, y2, class_id] format.

        # class label is at the end
        if convert_coco_boxes_to_yolo := True:
            # # This would assume [x, y, w, h]
            # annotation["target"] = CocoDatasetWrapper.convert_my_boxes_to_yolox(
            #             (self.image_width, self.image_height),
            #             annotation["target"])
            annotation["target"] = \
                    CocoDatasetWrapper.convert_absolute_minmaxcorner_box_to_relative_center_box(
                            size=(self.image_width, self.image_height), box=annotation["target"]
                            )

        if self.fix_order:
            # permute each row so that the last entry is now the first.
            # Necessary because https://github.com/Deci-AI/super-gradients/blob/d5a85fd318f4137806c37d73120f905e0d51f6a7/src/super_gradients/training/datasets/detection_datasets/coco_format_detection.py#L144 is wrong (inconsistent with the rest of their pipeline).
            #   E.g. calling dataset.plot() expects the class label to be at the end (index 4),
            #   but the loss expects it at index 0.
            # But then this will cause postprocessings to fail....
            annotation["target"][...,(0,1,2,3,4)] = annotation["target"][...,(4,0,1,2,3)]
            # class label should now be at the start, not at the end

        print(f"{annotation=}")
        return annotation

    @staticmethod
    def convert_absolute_minmaxcorner_box_to_relative_center_box(size, box):
        """
        input box: (xmin_absolute, ymin_absolute, xmax_absolute, ymax_absolute)
        output box: (xcenter_relative, ycenter_relative, width_relative, height_relative)
        """
        # https://stackoverflow.com/a/56121386/2550406
        dw = 1./size[0]
        dh = 1./size[1]
        x = (box[...,0] + box[...,2])/2.0
        y = (box[...,1] + box[...,3])/2.0
        w = box[...,2] - box[...,0]
        h = box[...,3] - box[...,1]
        x = x*dw
        w = w*dw
        y = y*dh
        h = h*dh

        box[...,0] = x
        box[...,1] = y
        box[...,2] = w
        box[...,3] = h
        return box

Above code goes from a COCO json dataset with x_min, y_min, width, height in pixels to a format with the same in relative values in [0,1]. Note that the COCOFormatDetectionDataset transformed that to x_min, y_min, x_max, y_max before we get access to it.
Additionally, my code snippet takes the original COCOFormatDetectionDataset's output and changes the order from (box, class) to (class, box), which is what the yolox loss expects.

Finally, if you want to visualize your results using the DetectionVisualizationCallback or the ExtremeBatchDetectionVisualizationCallback, you will run into a crash. The reason is that they both use this function which expects [class_id, x1, y1, x2, y2] but receives [image_id_in_batch, class_id, x1, y1, x2, y2] for the targets. So you need to patch that file in your installation to look like this:


            for label_xyxy in target_boxes:
                print(f"detection_utils.py: Target box i/?: {label_xyxy}")
                # The callback might give us [image_id, class, xyxy_box]
                # But we want [class, xyxy_box]
                label_xyxy = label_xyxy[-5:]

                image_with_targets = DetectionVisualization.draw_box_title(

Also in that file (detection_utils.py) there is the assumption that the data needs to be scaled up from relative to absolute values. This is apparently not true, as the super_gradients.training.dataloaders.dataloaders.coco_detection_yolo_format_train loads relative cx_cy_w_h data but then returns it as absolute(?) data. Debug prints in visualize_batch on line 625 show that we get pixel values there. So I've added a heuristic to automatically determine whether the data needs to be scaled:

@@ -618,7 +623,12 @@ class DetectionVisualization:
                                         0 for invisible, 1 for fully opaque
         """
         image_np = undo_preprocessing_func(image_tensor.detach())
-        targets = DetectionVisualization._scaled_ccwh_to_xyxy(target_boxes.detach().cpu().numpy().copy(), *image_np.shape[1:3], image_scale)
+        if (target_boxes < 1.).all():
+            targets = DetectionVisualization._scaled_ccwh_to_xyxy(target_boxes.detach().cpu().numpy().copy(), *image_np.shape[1:3], image_scale)
+            print(f"detection_utils.py after rescaling and transforming ccwh to xyxy: {targets[0]=}")
+        else:
+            targets = DetectionVisualization._scaled_ccwh_to_xyxy(target_boxes.detach().cpu().numpy().copy(), 1., 1., image_scale)
+            print(f"detection_utils.py after not rescaling but transforming ccwh to xyxy: {targets[0]=}")
         if pred_boxes is None:
             pred_boxes = [None for _ in range(image_np.shape[0])]
lucidBrot commented 7 months ago

I believe the main reason why the super_gradients.training.dataloaders.dataloaders.coco_detection_yolo_format_train behaved better for me is that I had specified a transform for it. There are a lot of different formats in use in this codebase.

Finally, I concluded from this that if I transform the initial dataset to LABEL_NORMALIZED_CXCYWH, the model would learn to predict relative coordinates for the bounding boxes. That, however, causes new issues that I don't want to investigate.

I am still using the patches I mentioned in my previous comment. Hope this helps someone else in the future.

mariannaparzych commented 7 months ago

@lucidBrot thanks for you answers.

shaydeci commented 7 months ago

@mariannaparzych indeed, @lucidBrot is right. The YoloX loss expects the LABEL_CXCYWH format. If you had anything else in the first index this can explain the error that you had when trying to compute it. We will update the docs of the YoloX loss so it would be more clear. I am closing this issue for now, if this or any other problem persists please feel free to re-open this issue or open a new one.