Dataset format - Githubissues

💡 Your Question

I am using COCO-like dataset https://github.com/DS4SD/DocLayNet. I use COCOFormatDetectionDataset from super_gradients to load the data. According to documentation and my observation COCOFormatDetectionDataset outputs annotations in format (x, y, x, y, class_id). I tried to use COCOFormatDetectionDataset, but training throws error:

Exception has occurred: RuntimeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Class values must be smaller than num_classes.

and I can see that pixel bbox values are use as gt_classes values.

Tutorial in notebooks/detection_how_to_connect_custom_dataset.ipynb suggests that Trainer needs annos in format (class_id, x_center, y_center, H, W). Is it actually true, that Trainer and Datasets are not compatibile?

Should Trainer get [x_center, y_center, H, W] as un pixels values or fractions related to image size in range (0,1)?

Versions

No response

@mariannaparzych Trainer does not "expect" a specific target forma. Other component like the loss and the model do.

It would be great if you could give additional information on what it is you are trying to achieve.

@shaydeci Thank you for the aswer. You're wright, I expressed my problem incorrectly.

I am trying to use trasfer learning for odject detection on COCO-like dataset. I started with notebooks/detection_how_to_connect_custom_dataset.ipynb:

from super_gradients.training import Trainer, models, training_hyperparams

trainer = Trainer(
        experiment_name=f"{experiment_name}_{timestmp:%Y-%m-%d_%H:%M}",
        ckpt_root_dir=ckpt_root_dir,
    )
 model = models.get(
        "yolox_l", pretrained_weights="coco", num_classes=train_dataset.num_classes
    )
train_params = training_hyperparams.get("coco2017_yolox")

I am just using default configuration from the tutorial. I assumed that to load COCO-like dataset I can use COCOFormatDetectionDataset class. But it throwed error:

Exception has occurred: RuntimeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Class values must be smaller than num_classes.
  File "/home/marianna.parzych/Unstructured/super-gradients/venv/lib/python3.10/site-packages/super_gradients/training/losses/yolox_loss.py", line 235, in _compute_loss
    gt_matched_classes, fg_mask, pred_ious_this_matching, matched_gt_inds, num_fg_img = self.get_assignments(
  File "/home/marianna.parzych/Unstructured/super-gradients/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/marianna.parzych/Unstructured/super-gradients/venv/lib/python3.10/site-packages/super_gradients/training/losses/yolox_loss.py", line 471, in get_assignments
    gt_cls_per_image = F.one_hot(gt_classes.to(torch.int64), self.num_classes)

It happens because the format of target from the dataset is wrong. I assumed, that the library has some standard formats, so you can choose dataset class that suits you and try it with different models and training parameters. If not, where can I find info in documentation about formats needed by different model implementations and loss functions?

With the COCOFormatDetectionDataset I also had issues, and I also originally expected it to just work. I think this is a bug, but maybe it's a design decision(?). It seems to load the data in XYXY format..

As a workaround, converting my data helped. Storing the data in YOLO format and loading it with super_gradients.training.dataloaders.dataloaders.coco_detection_yolo_format_train seems to have worked for me. Although this also returns the loaded data as XYXY and probably even in pixel coordinates. This works fine for training.

I've also given the direct way a try, loading the COCO json data. This requires a bit more diving into the code:
(However, I have not yet managed to successfully train with this. The Ground Truth visualizations look correct but the iou loss starts at 1 and stays there.)

Alternatively, you can wrap your dataset to bring it into the right shape. You'll have to determine what exactly you'll need to do for your data, of course. Something like this:


class CocoDatasetWrapper(COCOFormatDetectionDataset):
    """
    Description from Base class at https://github.com/Deci-AI/super-gradients/blob/2e591fdde09e18be06a41eafcfe7d2e8362346e4/src/super_gradients/training/datasets/detection_datasets/coco_format_detection.py#L20 :
    Base dataset to load ANY dataset that is with a similar structure to the COCO dataset.
    - Annotation file (.json). It has to respect the exact same format as COCO, for both the json schema and the bbox format (xywh).
    - One folder with all the images.

    Output format: (x, y, x, y, class_id)
    """

    def __init__(self, fix_order=True,
                 image_width: int|None = None,
                 image_height: int|None = None,
                 *args, **kwargs):
        self.fix_order = fix_order
        self.image_width = image_width
        self.image_height = image_height
        super().__init__(*args, **kwargs)

    def _load_annotation(self, sample_id: int) -> dict:
    # The initial annotations.json file contains the format [x_min, y_min, width, height]
        annotation = super()._load_annotation(sample_id)
    # This annotation is in [x1, y1, x2, y2, class_id] format.

        # class label is at the end
        if convert_coco_boxes_to_yolo := True:
            # # This would assume [x, y, w, h]
            # annotation["target"] = CocoDatasetWrapper.convert_my_boxes_to_yolox(
            #             (self.image_width, self.image_height),
            #             annotation["target"])
            annotation["target"] = \
                    CocoDatasetWrapper.convert_absolute_minmaxcorner_box_to_relative_center_box(
                            size=(self.image_width, self.image_height), box=annotation["target"]
                            )

        if self.fix_order:
            # permute each row so that the last entry is now the first.
            # Necessary because https://github.com/Deci-AI/super-gradients/blob/d5a85fd318f4137806c37d73120f905e0d51f6a7/src/super_gradients/training/datasets/detection_datasets/coco_format_detection.py#L144 is wrong (inconsistent with the rest of their pipeline).
            #   E.g. calling dataset.plot() expects the class label to be at the end (index 4),
            #   but the loss expects it at index 0.
            # But then this will cause postprocessings to fail....
            annotation["target"][...,(0,1,2,3,4)] = annotation["target"][...,(4,0,1,2,3)]
            # class label should now be at the start, not at the end

        print(f"{annotation=}")
        return annotation

    @staticmethod
    def convert_absolute_minmaxcorner_box_to_relative_center_box(size, box):
        """
        input box: (xmin_absolute, ymin_absolute, xmax_absolute, ymax_absolute)
        output box: (xcenter_relative, ycenter_relative, width_relative, height_relative)
        """
        # https://stackoverflow.com/a/56121386/2550406
        dw = 1./size[0]
        dh = 1./size[1]
        x = (box[...,0] + box[...,2])/2.0
        y = (box[...,1] + box[...,3])/2.0
        w = box[...,2] - box[...,0]
        h = box[...,3] - box[...,1]
        x = x*dw
        w = w*dw
        y = y*dh
        h = h*dh

        box[...,0] = x
        box[...,1] = y
        box[...,2] = w
        box[...,3] = h
        return box

Above code goes from a COCO json dataset with x_min, y_min, width, height in pixels to a format with the same in relative values in [0,1]. Note that the COCOFormatDetectionDataset transformed that to x_min, y_min, x_max, y_max before we get access to it.
Additionally, my code snippet takes the original COCOFormatDetectionDataset's output and changes the order from (box, class) to (class, box), which is what the yolox loss expects.

Finally, if you want to visualize your results using the DetectionVisualizationCallback or the ExtremeBatchDetectionVisualizationCallback, you will run into a crash. The reason is that they both use this function which expects [class_id, x1, y1, x2, y2] but receives [image_id_in_batch, class_id, x1, y1, x2, y2] for the targets. So you need to patch that file in your installation to look like this:


            for label_xyxy in target_boxes:
                print(f"detection_utils.py: Target box i/?: {label_xyxy}")
                # The callback might give us [image_id, class, xyxy_box]
                # But we want [class, xyxy_box]
                label_xyxy = label_xyxy[-5:]

                image_with_targets = DetectionVisualization.draw_box_title(

Also in that file (detection_utils.py) there is the assumption that the data needs to be scaled up from relative to absolute values. This is apparently not true, as the super_gradients.training.dataloaders.dataloaders.coco_detection_yolo_format_train loads relative cx_cy_w_h data but then returns it as absolute(?) data. Debug prints in visualize_batch on line 625 show that we get pixel values there. So I've added a heuristic to automatically determine whether the data needs to be scaled:

@@ -618,7 +623,12 @@ class DetectionVisualization:
                                         0 for invisible, 1 for fully opaque
         """
         image_np = undo_preprocessing_func(image_tensor.detach())
-        targets = DetectionVisualization._scaled_ccwh_to_xyxy(target_boxes.detach().cpu().numpy().copy(), *image_np.shape[1:3], image_scale)
+        if (target_boxes < 1.).all():
+            targets = DetectionVisualization._scaled_ccwh_to_xyxy(target_boxes.detach().cpu().numpy().copy(), *image_np.shape[1:3], image_scale)
+            print(f"detection_utils.py after rescaling and transforming ccwh to xyxy: {targets[0]=}")
+        else:
+            targets = DetectionVisualization._scaled_ccwh_to_xyxy(target_boxes.detach().cpu().numpy().copy(), 1., 1., image_scale)
+            print(f"detection_utils.py after not rescaling but transforming ccwh to xyxy: {targets[0]=}")
         if pred_boxes is None:
             pred_boxes = [None for _ in range(image_np.shape[0])]

I believe the main reason why the super_gradients.training.dataloaders.dataloaders.coco_detection_yolo_format_train behaved better for me is that I had specified a transform for it. There are a lot of different formats in use in this codebase.

The coco_detection_yolo_format_train is a wrapper to the YoloDarknetFormatDetectionDataset which has a conversion from LABEL_CXCYWH to XYXY_LABEL
We can browse all the possible formats at https://github.com/Deci-AI/super-gradients/blob/56de963ef40cd0e4e1c437b90536735ce7f71ba3/src/super_gradients/training/datasets/data_formats/default_formats.py#L12 (and their references in the github sidebar)
- Notably, there is also a LABEL_NORMALIZED_CXCYWH. This implies that the dataset actually expects absolute pixel coordinates, although YOLO-format would normally have them normalized. But because the XYXY_LABEL output is also not normalized, this just means that a normalized yolo-formatted dataset on disk will be loaded in normalized [x1 y1 x2 y2 class] format now.
- You can specify transforms in the dataset.dataset.transforms = [] property. These allow you to avoid creating a dataset wrapper like I did. For example:
```
val_data = coco_detection_yolo_format_val( ... )
from super_gradients.training.datasets.data_formats.default_formats import XYXY_LABEL, LABEL_XYXY, \
from super_gradients.training.transforms.transforms import DetectionTargetsFormatTransform
val_data.dataset.transforms = [ DetectionTargetsFormatTransform(input_size, input_format = XYXY_LABEL, output_format = LABEL_CXCYWH) ]
```
The distinction between LABEL_XYXY and XYXY_LABEL is whether the class comes first or last. The YOLOX Model expects the class label before the box coordinates. Hence the use of LABEL_CXCYWH.
The model outputs some more complex raw format. This is later converted inside yolox_loss.py#prepare_predictions to some format that can be compared with the targets. I haven't understood this part yet.
A similar (hopefully identical) conversion happens in the Visualization Callbacks, which have a post_prediction_callback that you can set to an instance of YoloXPostPredictionCallback to do this.
- After the post-prediction callback, the DetectionVisualizationCallback (and ExtremeBatchDetectionVisualizationCallback) seem to expect predictions to be in absolute XYXY format, and targets in relative CXCYWH format...
- This confuses me, so I added the patch at the end of my previous comment, so it also works with absolute values. But I think that should not be necessary if you manage to use the correct data transform at the start.

Finally, I concluded from this that if I transform the initial dataset to LABEL_NORMALIZED_CXCYWH, the model would learn to predict relative coordinates for the bounding boxes. That, however, causes new issues that I don't want to investigate.

I am still using the patches I mentioned in my previous comment. Hope this helps someone else in the future.

@lucidBrot thanks for you answers.

@mariannaparzych indeed, @lucidBrot is right. The YoloX loss expects the LABEL_CXCYWH format. If you had anything else in the first index this can explain the error that you had when trying to compute it. We will update the docs of the YoloX loss so it would be more clear. I am closing this issue for now, if this or any other problem persists please feel free to re-open this issue or open a new one.

Deci-AI / super-gradients

Dataset format #1768

💡 Your Question

Versions