NVIDIA / DIGITS

Deep Learning GPU Training System
https://developer.nvidia.com/digits
BSD 3-Clause "New" or "Revised" License
4.12k stars 1.38k forks source link

"detectnet_augmentation_param" layer bounding box transformations #1770

Open bfreskura opened 7 years ago

bfreskura commented 7 years ago

Let's say that I have the augmentation layer written as below:

detectnet_augmentation_param {
    crop_prob: 1.0
    shift_x: 32
    shift_y: 32
    scale_prob: 0.5
    scale_min: 0.5
    scale_max: 2.0
    flip_prob: 0.5
    rotation_prob: 0.8
    max_rotate_degree: 40.0
    hue_rotation_prob: 0.8
    hue_rotation: 30.0
    desaturation_prob: 0.8
    desaturation_max: 0.6
  }

I'm doing object detection and I'm not sure if the proper transformations are also applied to the bounding boxes. I'm referring to the scale, rotation, and flip in particular because they affect the object location in the image. I ran some experiments and it seems that bounding boxes are not affected by these transformations which results in poor results while learning. When I've disabled the above-mentioned transformations, the results significantly improved.

Can someone explain this in more detail?

jveitchmichaelis commented 7 years ago

@barty777 Did you verify the same result for the crop parameter (since presumably that also induces an offset in the bounding box)? I would guess in the case of crops, the shift is small enough (32px by default) that the bounding box would mostly still contain the object, if it was left unchanged.

However this config file is provided with the example KITTI data, and that doesn't seem to be adversely affected - it still trains well. Unless I'm missing something?

This would certainly corroborate reports from people on here who have said that manually augmenting their data produced better results. Otherwise, if you could rely on the DetectNet augmentations then there should be no need to manually generate datasets with flips etc.

jveitchmichaelis commented 7 years ago

Follow up to this, the augmentation is defined here. There are several functions for the various transformations. Label augmentation is always performed on the CPU, as per here.

Flips, scaling, rotations and crops are covered. So is there a bug somewhere else?