Is it possible to modify the labels according to the random rotation of images?

WYBupup commented 7 months ago

Describe the question.

I am now working on a training framework for image rotation(0、90、180、270 degree) recognization task. Since my dataset is so large that it is unavailable to rotate every images tothe above four angles, becasue there is not enough space on the machine to store them. As a result, my approach is to, in the preprocess step, randomly rotate images to one of the above four degrees and change the label accordingly. However, itmakes the time cost of preprocessing be the main part of the total time cost. I want to use DALI to accelerate preprocessing process, and I wonder whether I could random rotate the image and change the label accordingly in the pipeline?

Check for duplicates

[X] I have searched the open bugs/issues and have found no duplicates for this bug report

JanuszL commented 7 months ago

Hi @WYBupup,

Thank you for reaching out. Yes, you can do that using the rotate operator and feed it with an output of the random uniform operator that selects the values from values=[0, 90, 180, 270] set. To adjust labels you can use the output from the same random operator and write a python operator if you do any elaborate adjustment of the labels or just express it using mathematical operators.

WYBupup commented 7 months ago

Thanks for your reply. I have almost complete the pipeline following your guidance. But I encounter another problem. In my old-version python-based preprocessing pipeline, I resize the image to fixed size while maintaining the aspect ratio. And then, using cv2.copyMakeBorder to place the picture in the center and pad elements around it. I try to emulate this oepration using DALI, but it seems that the padding operator only supports single-direction padding. My objective is to emulate cv2.copyMakeBorder to pad around the original image. I wonder if there is any operator to achieve the goal?

mzient commented 7 months ago

If you're already rotating the images, you can pass the size explicitly to fn.rotate - you can make it fill the borders with a constant value (monochrome!) or replicate the border. If either of those methods suits you, it will be cheaper to have one operator instead of two.

import nvidia.dali as dali
import nvidia.dali.fn as fn
import PIL.Image
import numpy as np

@dali.pipeline_def(batch_size=1, num_threads=4, device_id=0)
def mypipe():
    enc, _ = fn.readers.file(file_root=".", files=["alley.png"])
    img = fn.decoders.image(enc, device="mixed")
    img = fn.resize(img, mode="not_larger", size=256)
    rep = fn.rotate(img, angle=90, size=(256, 256))
    pad = fn.rotate(img, angle=90, size=(256, 256), fill_value=0)
    return rep, pad

pipe = mypipe()
pipe.build()
rep, pad = pipe.run()

The results:

PIL.Image.fromarray(np.array(rep.as_cpu()[0]))

PIL.Image.fromarray(np.array(pad.as_cpu()[0]))

If you need a color padding, you can, somewhat counterintuitively, use fn.crop:

    rot = fn.rotate(img, angle=90)
    crop = fn.crop(rot, crop_pos_x=0.5, crop_pos_y=0.5, crop=(256, 256), out_of_bounds_policy="pad", fill_values=[0x76, 0xb9, 0x00])

The result is:

WYBupup commented 7 months ago

thanks a lot! This is really helpful!

mzient commented 7 months ago

Also, if you're fine with bilinear resizing without antialiasing, then you can do all those transforms in one go with fn.warp_affine:

import nvidia.dali as dali
import nvidia.dali.fn as fn
import PIL.Image
import numpy as np

@dali.pipeline_def(batch_size=1, num_threads=4, device_id=0)
def mypipe():
    enc, _ = fn.readers.file(file_root=".", files=["alley.png"])
    img = fn.decoders.image(enc, device="mixed")
    shape = fn.peek_image_shape(enc)
    h = shape[0]
    w = shape[1]
    size = fn.stack(w, h)
    scale = dali.math.min(256/w, 256/h)
    out_size = fn.cast(scale * size, dtype=dali.types.INT32)

    # use negative angle, since here we use source-to-destination matrix
    mtx = fn.transforms.rotation(angle=-90, center=size/2)
    mtx = fn.transforms.scale(mtx, scale=fn.stack(scale, scale))
    mtx = fn.transforms.translation(mtx, offset=(256.0 - out_size) // 2)

    warped = fn.warp_affine(img, size=(256, 256), matrix=mtx, fill_value=0, inverse_map=False)
    return warped

pipe = mypipe()
pipe.build()
warped, = pipe.run()

The result is:

The aliasing artifacts are quite obvious when you compare this image to the previous ones, but if it's OK for you, then this method will certainly be the most performant one. The added benefit is that you end up with a complete transformation matrix, so if your labels are in fact some points, you can use this matrix to transform them. See this tutorial to learn how to use a transformation matrix to transform keypoints alongside images.

The methods sorted in efficiency order:

warp_affine
resize + rotate with border handling
resize + rotate + crop

NVIDIA / DALI

Is it possible to modify the labels according to the random rotation of images? #5215

Describe the question.

Check for duplicates