Closed mariannaparzych closed 7 months ago
@mariannaparzych Trainer does not "expect" a specific target forma. Other component like the loss and the model do.
It would be great if you could give additional information on what it is you are trying to achieve.
@shaydeci Thank you for the aswer. You're wright, I expressed my problem incorrectly.
I am trying to use trasfer learning for odject detection on COCO-like dataset. I started with notebooks/detection_how_to_connect_custom_dataset.ipynb
:
from super_gradients.training import Trainer, models, training_hyperparams
trainer = Trainer(
experiment_name=f"{experiment_name}_{timestmp:%Y-%m-%d_%H:%M}",
ckpt_root_dir=ckpt_root_dir,
)
model = models.get(
"yolox_l", pretrained_weights="coco", num_classes=train_dataset.num_classes
)
train_params = training_hyperparams.get("coco2017_yolox")
I am just using default configuration from the tutorial. I assumed that to load COCO-like dataset I can use COCOFormatDetectionDataset
class. But it throwed error:
Exception has occurred: RuntimeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Class values must be smaller than num_classes.
File "/home/marianna.parzych/Unstructured/super-gradients/venv/lib/python3.10/site-packages/super_gradients/training/losses/yolox_loss.py", line 235, in _compute_loss
gt_matched_classes, fg_mask, pred_ious_this_matching, matched_gt_inds, num_fg_img = self.get_assignments(
File "/home/marianna.parzych/Unstructured/super-gradients/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/marianna.parzych/Unstructured/super-gradients/venv/lib/python3.10/site-packages/super_gradients/training/losses/yolox_loss.py", line 471, in get_assignments
gt_cls_per_image = F.one_hot(gt_classes.to(torch.int64), self.num_classes)
It happens because the format of target from the dataset is wrong. I assumed, that the library has some standard formats, so you can choose dataset class that suits you and try it with different models and training parameters. If not, where can I find info in documentation about formats needed by different model implementations and loss functions?
With the COCOFormatDetectionDataset
I also had issues, and I also originally expected it to just work. I think this is a bug, but maybe it's a design decision(?). It seems to load the data in XYXY
format..
As a workaround, converting my data helped.
Storing the data in YOLO format and loading it with super_gradients.training.dataloaders.dataloaders.coco_detection_yolo_format_train
seems to have worked for me. Although this also returns the loaded data as XYXY and probably even in pixel coordinates. This works fine for training.
I've also given the direct way a try, loading the COCO json data. This requires a bit more diving into the code:
(However, I have not yet managed to successfully train with this. The Ground Truth visualizations look correct but the iou loss starts at 1 and stays there.)
Alternatively, you can wrap your dataset to bring it into the right shape. You'll have to determine what exactly you'll need to do for your data, of course. Something like this:
class CocoDatasetWrapper(COCOFormatDetectionDataset):
"""
Description from Base class at https://github.com/Deci-AI/super-gradients/blob/2e591fdde09e18be06a41eafcfe7d2e8362346e4/src/super_gradients/training/datasets/detection_datasets/coco_format_detection.py#L20 :
Base dataset to load ANY dataset that is with a similar structure to the COCO dataset.
- Annotation file (.json). It has to respect the exact same format as COCO, for both the json schema and the bbox format (xywh).
- One folder with all the images.
Output format: (x, y, x, y, class_id)
"""
def __init__(self, fix_order=True,
image_width: int|None = None,
image_height: int|None = None,
*args, **kwargs):
self.fix_order = fix_order
self.image_width = image_width
self.image_height = image_height
super().__init__(*args, **kwargs)
def _load_annotation(self, sample_id: int) -> dict:
# The initial annotations.json file contains the format [x_min, y_min, width, height]
annotation = super()._load_annotation(sample_id)
# This annotation is in [x1, y1, x2, y2, class_id] format.
# class label is at the end
if convert_coco_boxes_to_yolo := True:
# # This would assume [x, y, w, h]
# annotation["target"] = CocoDatasetWrapper.convert_my_boxes_to_yolox(
# (self.image_width, self.image_height),
# annotation["target"])
annotation["target"] = \
CocoDatasetWrapper.convert_absolute_minmaxcorner_box_to_relative_center_box(
size=(self.image_width, self.image_height), box=annotation["target"]
)
if self.fix_order:
# permute each row so that the last entry is now the first.
# Necessary because https://github.com/Deci-AI/super-gradients/blob/d5a85fd318f4137806c37d73120f905e0d51f6a7/src/super_gradients/training/datasets/detection_datasets/coco_format_detection.py#L144 is wrong (inconsistent with the rest of their pipeline).
# E.g. calling dataset.plot() expects the class label to be at the end (index 4),
# but the loss expects it at index 0.
# But then this will cause postprocessings to fail....
annotation["target"][...,(0,1,2,3,4)] = annotation["target"][...,(4,0,1,2,3)]
# class label should now be at the start, not at the end
print(f"{annotation=}")
return annotation
@staticmethod
def convert_absolute_minmaxcorner_box_to_relative_center_box(size, box):
"""
input box: (xmin_absolute, ymin_absolute, xmax_absolute, ymax_absolute)
output box: (xcenter_relative, ycenter_relative, width_relative, height_relative)
"""
# https://stackoverflow.com/a/56121386/2550406
dw = 1./size[0]
dh = 1./size[1]
x = (box[...,0] + box[...,2])/2.0
y = (box[...,1] + box[...,3])/2.0
w = box[...,2] - box[...,0]
h = box[...,3] - box[...,1]
x = x*dw
w = w*dw
y = y*dh
h = h*dh
box[...,0] = x
box[...,1] = y
box[...,2] = w
box[...,3] = h
return box
Above code goes from a COCO json dataset with x_min, y_min, width, height
in pixels to a format with the same in relative values in [0,1]. Note that the COCOFormatDetectionDataset
transformed that to x_min, y_min, x_max, y_max
before we get access to it.
Additionally, my code snippet takes the original COCOFormatDetectionDataset
's output and changes the order from (box, class)
to (class, box)
, which is what the yolox loss expects.
Finally, if you want to visualize your results using the DetectionVisualizationCallback
or the ExtremeBatchDetectionVisualizationCallback
, you will run into a crash. The reason is that they both use this function which expects [class_id, x1, y1, x2, y2]
but receives [image_id_in_batch, class_id, x1, y1, x2, y2]
for the targets. So you need to patch that file in your installation to look like this:
for label_xyxy in target_boxes:
print(f"detection_utils.py: Target box i/?: {label_xyxy}")
# The callback might give us [image_id, class, xyxy_box]
# But we want [class, xyxy_box]
label_xyxy = label_xyxy[-5:]
image_with_targets = DetectionVisualization.draw_box_title(
Also in that file (detection_utils.py
) there is the assumption that the data needs to be scaled up from relative to absolute values. This is apparently not true, as the super_gradients.training.dataloaders.dataloaders.coco_detection_yolo_format_train
loads relative cx_cy_w_h
data but then returns it as absolute(?) data. Debug prints in visualize_batch
on line 625 show that we get pixel values there. So I've added a heuristic to automatically determine whether the data needs to be scaled:
@@ -618,7 +623,12 @@ class DetectionVisualization:
0 for invisible, 1 for fully opaque
"""
image_np = undo_preprocessing_func(image_tensor.detach())
- targets = DetectionVisualization._scaled_ccwh_to_xyxy(target_boxes.detach().cpu().numpy().copy(), *image_np.shape[1:3], image_scale)
+ if (target_boxes < 1.).all():
+ targets = DetectionVisualization._scaled_ccwh_to_xyxy(target_boxes.detach().cpu().numpy().copy(), *image_np.shape[1:3], image_scale)
+ print(f"detection_utils.py after rescaling and transforming ccwh to xyxy: {targets[0]=}")
+ else:
+ targets = DetectionVisualization._scaled_ccwh_to_xyxy(target_boxes.detach().cpu().numpy().copy(), 1., 1., image_scale)
+ print(f"detection_utils.py after not rescaling but transforming ccwh to xyxy: {targets[0]=}")
if pred_boxes is None:
pred_boxes = [None for _ in range(image_np.shape[0])]
I believe the main reason why the super_gradients.training.dataloaders.dataloaders.coco_detection_yolo_format_train
behaved better for me is that I had specified a transform for it. There are a lot of different formats in use in this codebase.
The coco_detection_yolo_format_train
is a wrapper to the YoloDarknetFormatDetectionDataset
which has a conversion from LABEL_CXCYWH
to XYXY_LABEL
We can browse all the possible formats at https://github.com/Deci-AI/super-gradients/blob/56de963ef40cd0e4e1c437b90536735ce7f71ba3/src/super_gradients/training/datasets/data_formats/default_formats.py#L12 (and their references in the github sidebar)
LABEL_NORMALIZED_CXCYWH
. This implies that the dataset actually expects absolute pixel coordinates, although YOLO-format would normally have them normalized. But because the XYXY_LABEL
output is also not normalized, this just means that a normalized yolo-formatted dataset on disk will be loaded in normalized [x1 y1 x2 y2 class]
format now.dataset.dataset.transforms = []
property. These allow you to avoid creating a dataset wrapper like I did. For example:
val_data = coco_detection_yolo_format_val( ... )
from super_gradients.training.datasets.data_formats.default_formats import XYXY_LABEL, LABEL_XYXY, \
from super_gradients.training.transforms.transforms import DetectionTargetsFormatTransform
val_data.dataset.transforms = [ DetectionTargetsFormatTransform(input_size, input_format = XYXY_LABEL, output_format = LABEL_CXCYWH) ]
The distinction between LABEL_XYXY
and XYXY_LABEL
is whether the class comes first or last. The YOLOX Model expects the class label before the box coordinates. Hence the use of LABEL_CXCYWH
.
The model outputs some more complex raw format. This is later converted inside yolox_loss.py#prepare_predictions
to some format that can be compared with the targets. I haven't understood this part yet.
A similar (hopefully identical) conversion happens in the Visualization Callbacks, which have a post_prediction_callback
that you can set to an instance of YoloXPostPredictionCallback
to do this.
DetectionVisualizationCallback
(and ExtremeBatchDetectionVisualizationCallback
) seem to expect predictions to be in absolute XYXY format, and targets in relative CXCYWH format... Finally, I concluded from this that if I transform the initial dataset to LABEL_NORMALIZED_CXCYWH
, the model would learn to predict relative coordinates for the bounding boxes. That, however, causes new issues that I don't want to investigate.
I am still using the patches I mentioned in my previous comment. Hope this helps someone else in the future.
@lucidBrot thanks for you answers.
@mariannaparzych indeed, @lucidBrot is right. The YoloX loss expects the LABEL_CXCYWH format. If you had anything else in the first index this can explain the error that you had when trying to compute it. We will update the docs of the YoloX loss so it would be more clear. I am closing this issue for now, if this or any other problem persists please feel free to re-open this issue or open a new one.
💡 Your Question
I am using COCO-like dataset https://github.com/DS4SD/DocLayNet. I use
COCOFormatDetectionDataset
from super_gradients to load the data. According to documentation and my observationCOCOFormatDetectionDataset
outputs annotations in format(x, y, x, y, class_id)
. I tried to useCOCOFormatDetectionDataset
, but training throws error:and I can see that pixel bbox values are use as gt_classes values.
Tutorial in notebooks/detection_how_to_connect_custom_dataset.ipynb suggests that
Trainer
needs annos in format (class_id, x_center, y_center, H, W). Is it actually true, that Trainer and Datasets are not compatibile?Should Trainer get [x_center, y_center, H, W] as un pixels values or fractions related to image size in range (0,1)?
Versions
No response