Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.51k stars 490 forks source link

DetectionRandomAffine target-size is in wrong format #2011

Closed cansik closed 2 months ago

cansik commented 3 months ago

🐛 Describe the bug

During training of a detection model with input_dim not a 1:1 ratio (640x480 = 4:3), I experience a strange behaviour of the DetectionRandomAffine transform. It seems that even when all parameters are set to have no change on the actual image, the image seems translated after the transform:

dataset_params:
  train_dataset_params:
    input_dim: [ 480, 640 ]

train_transforms:
  - DetectionRandomAffine:
      # rotation degrees, randomly sampled from [-degrees, degrees]
      degrees: [ 0, 0 ]
      # image translation fraction
      translate: [ 0.5, 0.5 ]
      # random rescale range (keeps size by padding/cropping) after mosaic transform.
      scales: [ 1.0, 1.0 ]
      # shear degrees, randomly sampled from [-degrees, degrees]
      shear: [ 0.0, 0.0 ]
      target_size: ${dataset_params.train_dataset_params.input_dim}
      # whether to filter out transformed bboxes by edge size, area ratio, and aspect ratio.
      filter_box_candidates: false
  - DetectionTargetsFormatTransform:
      input_dim: ${dataset_params.train_dataset_params.input_dim}
      output_format: LABEL_CXCYWH
image

Figure 1: First 4 images of coco with wrong aspect ratio

It looks like (Figure 1) the aspect ratio of the image is wrong, so I changed the target_size of the DetectionRandomAffine to [640, 480]:

train_transforms:
  - DetectionRandomAffine:
      target_size: [640, 480]
image

Figure 2: First 4 images of coco with correct aspect ratio but wrong translation

Solution

After looking into the code it seems that the target_size is expected to be row / column (height / width) format. However, for cv2.warpAffine (Link) it has to be reversed (cols / rows). After reversing the target_size it seems to work as expected.

train_transforms:
  - DetectionRandomAffine:
      target_size: ${dataset_params.train_dataset_params.input_dim}
image

Figure 2: First 4 images of coco with correct aspect ratio and no translation (first image is 640x480)

In my opinion someone already tried to fix it by just reversing the target_size if it is not explicitly defined as input parameter. However, this does not lead to the correct behaviour.

Versions

Windows and MacOS super-gradients pypi 3.7.1 and super-gradients master from github

isatyamks commented 2 months ago

@cansik, I would like to work on issue #2011. Could you please assign this issue to me?

cansik commented 2 months ago

@isatyamks I am not a maintainer and the bug has already been fixed in #2012