dmlc / gluon-cv

Gluon CV Toolkit
http://gluon-cv.mxnet.io
Apache License 2.0
5.84k stars 1.21k forks source link

Image augmentation: missing rotation #800

Open ivanquirino opened 5 years ago

ivanquirino commented 5 years ago

It's nice to have those nice image augmentations in both GluonCV and Gluon APIs, but I am missing rotation utilities, to randomly rotate images and bounding boxes. These would be a nice feature to have.

Another missing thing is that in Gluon we have the nice Compose API, to chain together various transforms in a nice way. It would be cool if the GluonCV transforms could be mixed in the compose API.

zhreshold commented 5 years ago
  1. The existing network and loss don't respect rotated bounding boxes, so the best bet is that we can adjust the bounding boxes to rotated objects. This way is doable, but expensive. But just as you have said, it's a nice feature to have.

  2. For image classification, we use Compose in GluonCV, but with object detection, it's getting more complicated, I will rethink about it.

ivanquirino commented 5 years ago

@zhreshold thanks for answering.

  1. Adjusting the bouding boxes is exactly what I was thinking.

  2. I've built a custom transform class myself for my custom training script which is based on the training script on the VOC dataset example, I use Compose transforms on it, because I needed a YOLOv3 transform without random flipping or cropping:

class CustomTransform(object):
    def __init__(self, net=None, size=(SIZE, SIZE), mean=(0.485, 0.456, 0.406),
    std=(0.229, 0.224, 0.225), **kwargs):
        self._size = size
        self._mean = mean
        self._std = std

        self._img_transform = transforms.Compose([
            transforms.Resize(size, keep_ratio=True),
            transforms.RandomColorJitter(0.1, 0.1, 0.1, 0.1),
            transforms.RandomLighting(0.1),
            transforms.ToTensor(),
            transforms.Normalize(mean=mean, std=std)
        ])

        self._target_generator = None

        if net is None:
            return

        # in case network has reset_ctx to gpu
        self._fake_x = nd.zeros((1, 3, size[0], size[0]))
        net = copy.deepcopy(net)
        net.collect_params().reset_ctx(None)

        with autograd.train_mode():
            _, self._anchors, self._offsets, self._feat_maps, _, _, _, _ = net(self._fake_x)

        self._target_generator = YOLOV3PrefetchTargetGenerator(
            num_class=len(net.classes), **kwargs)

    def __call__(self, src, label):
        h, w, _ = src.shape        

        img = self._img_transform(src)
        label = bbox.resize(label, (w,h), (self._size[0], self._size[1]))

        if  self._target_generator is None:
            return img, label.astype(img.dtype)

        gt_bboxes = nd.array(label[np.newaxis, :, :4])
        gt_ids = nd.array(label[np.newaxis, :, 4:5])
        gt_mixratio = None

        objectness, center_targets, scale_targets, weights, class_targets = self._target_generator(
            self._fake_x, self._feat_maps, self._anchors, self._offsets,
            gt_bboxes, gt_ids, gt_mixratio)

        return (img, objectness[0], center_targets[0], scale_targets[0], weights[0],
                class_targets[0], gt_bboxes[0])
  1. It would be nice if GluonCV provided official trainings script for custom datasets on the computer vision models: classification, detection and segmentation. There's a tutorial for preparing your custom data for object detection, but we end up having to find a script on the web of adapting our own from the scripts suited for the standard datasets.
ivanquirino commented 5 years ago

I have a problem with my traning script on the validation stage, I get somethng like the following output for both VOCApMetric and VOC07ApMetric:

CLASS_1=NaN CLASS_2=NaN CLASS_3=NaN CLASS_4=99% CLASS_5=98% CLASS_6=96% mAP=97%

I think this problem may be

  1. my dataset is really small
  2. CLASS_3 and CLASS_6 are too similar and can be turned into a single class
  3. VOCApMetric classes are not suited for custom data and I should do my own metrics.

I would appreciate some input on this since I'm kinda new to ML in general, but I'm looking into this for an object detection application.

zhreshold commented 5 years ago

The nan's looks suspicious to me, may be you don't have such classes in your validation set?

ivanquirino commented 5 years ago

That's my validation LST file. It contains all classes. pads.val.txt

What's strange is that after training the network can detect correctly every image in the validation set.

ivanquirino commented 5 years ago

Correction: there is one image in the validation set that it wrongly classifies. I think the main problem is in my labeling, every label in classes 3 and 6 relate to an object which there's no left and right but those objects are being labeled as left and right. The other four classes have left and right labeled correctly. The correct way to label is to put classes 3 and 6 as the same class, have have 5 classes instead of 6.

Thanks for your input, @zhreshold , you made me see the error better.