NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
4.98k stars 609 forks source link

Is it possible to modify the labels according to the random rotation of images? #5215

Open WYBupup opened 7 months ago

WYBupup commented 7 months ago

Describe the question.

I am now working on a training framework for image rotation(0、90、180、270 degree) recognization task. Since my dataset is so large that it is unavailable to rotate every images tothe above four angles, becasue there is not enough space on the machine to store them. As a result, my approach is to, in the preprocess step, randomly rotate images to one of the above four degrees and change the label accordingly. However, itmakes the time cost of preprocessing be the main part of the total time cost. I want to use DALI to accelerate preprocessing process, and I wonder whether I could random rotate the image and change the label accordingly in the pipeline?

Check for duplicates

JanuszL commented 7 months ago

Hi @WYBupup,

Thank you for reaching out. Yes, you can do that using the rotate operator and feed it with an output of the random uniform operator that selects the values from values=[0, 90, 180, 270] set. To adjust labels you can use the output from the same random operator and write a python operator if you do any elaborate adjustment of the labels or just express it using mathematical operators.

WYBupup commented 7 months ago

Thanks for your reply. I have almost complete the pipeline following your guidance. But I encounter another problem. In my old-version python-based preprocessing pipeline, I resize the image to fixed size while maintaining the aspect ratio. And then, using cv2.copyMakeBorder to place the picture in the center and pad elements around it. I try to emulate this oepration using DALI, but it seems that the padding operator only supports single-direction padding. My objective is to emulate cv2.copyMakeBorder to pad around the original image. I wonder if there is any operator to achieve the goal?

mzient commented 7 months ago

If you're already rotating the images, you can pass the size explicitly to fn.rotate - you can make it fill the borders with a constant value (monochrome!) or replicate the border. If either of those methods suits you, it will be cheaper to have one operator instead of two.

import nvidia.dali as dali
import nvidia.dali.fn as fn
import PIL.Image
import numpy as np

@dali.pipeline_def(batch_size=1, num_threads=4, device_id=0)
def mypipe():
    enc, _ = fn.readers.file(file_root=".", files=["alley.png"])
    img = fn.decoders.image(enc, device="mixed")
    img = fn.resize(img, mode="not_larger", size=256)
    rep = fn.rotate(img, angle=90, size=(256, 256))
    pad = fn.rotate(img, angle=90, size=(256, 256), fill_value=0)
    return rep, pad

pipe = mypipe()
pipe.build()
rep, pad = pipe.run()

The results:

PIL.Image.fromarray(np.array(rep.as_cpu()[0]))

image

PIL.Image.fromarray(np.array(pad.as_cpu()[0]))

image

If you need a color padding, you can, somewhat counterintuitively, use fn.crop:

    rot = fn.rotate(img, angle=90)
    crop = fn.crop(rot, crop_pos_x=0.5, crop_pos_y=0.5, crop=(256, 256), out_of_bounds_policy="pad", fill_values=[0x76, 0xb9, 0x00])

The result is: image

WYBupup commented 7 months ago

thanks a lot! This is really helpful!

mzient commented 7 months ago

Also, if you're fine with bilinear resizing without antialiasing, then you can do all those transforms in one go with fn.warp_affine:

import nvidia.dali as dali
import nvidia.dali.fn as fn
import PIL.Image
import numpy as np

@dali.pipeline_def(batch_size=1, num_threads=4, device_id=0)
def mypipe():
    enc, _ = fn.readers.file(file_root=".", files=["alley.png"])
    img = fn.decoders.image(enc, device="mixed")
    shape = fn.peek_image_shape(enc)
    h = shape[0]
    w = shape[1]
    size = fn.stack(w, h)
    scale = dali.math.min(256/w, 256/h)
    out_size = fn.cast(scale * size, dtype=dali.types.INT32)

    # use negative angle, since here we use source-to-destination matrix
    mtx = fn.transforms.rotation(angle=-90, center=size/2)
    mtx = fn.transforms.scale(mtx, scale=fn.stack(scale, scale))
    mtx = fn.transforms.translation(mtx, offset=(256.0 - out_size) // 2)

    warped = fn.warp_affine(img, size=(256, 256), matrix=mtx, fill_value=0, inverse_map=False)
    return warped

pipe = mypipe()
pipe.build()
warped, = pipe.run()

The result is: image

The aliasing artifacts are quite obvious when you compare this image to the previous ones, but if it's OK for you, then this method will certainly be the most performant one. The added benefit is that you end up with a complete transformation matrix, so if your labels are in fact some points, you can use this matrix to transform them. See this tutorial to learn how to use a transformation matrix to transform keypoints alongside images.

The methods sorted in efficiency order:

  1. warp_affine
  2. resize + rotate with border handling
  3. resize + rotate + crop