Open WYBupup opened 7 months ago
Hi @WYBupup,
Thank you for reaching out.
Yes, you can do that using the rotate
operator and feed it with an output of the random uniform
operator that selects the values from values=[0, 90, 180, 270]
set. To adjust labels you can use the output from the same random operator and write a python operator if you do any elaborate adjustment of the labels or just express it using mathematical operators.
Thanks for your reply. I have almost complete the pipeline following your guidance. But I encounter another problem. In my old-version python-based preprocessing pipeline, I resize the image to fixed size while maintaining the aspect ratio. And then, using cv2.copyMakeBorder to place the picture in the center and pad elements around it. I try to emulate this oepration using DALI, but it seems that the padding operator only supports single-direction padding. My objective is to emulate cv2.copyMakeBorder to pad around the original image. I wonder if there is any operator to achieve the goal?
If you're already rotating the images, you can pass the size explicitly to fn.rotate
- you can make it fill the borders with a constant value (monochrome!) or replicate the border. If either of those methods suits you, it will be cheaper to have one operator instead of two.
import nvidia.dali as dali
import nvidia.dali.fn as fn
import PIL.Image
import numpy as np
@dali.pipeline_def(batch_size=1, num_threads=4, device_id=0)
def mypipe():
enc, _ = fn.readers.file(file_root=".", files=["alley.png"])
img = fn.decoders.image(enc, device="mixed")
img = fn.resize(img, mode="not_larger", size=256)
rep = fn.rotate(img, angle=90, size=(256, 256))
pad = fn.rotate(img, angle=90, size=(256, 256), fill_value=0)
return rep, pad
pipe = mypipe()
pipe.build()
rep, pad = pipe.run()
The results:
PIL.Image.fromarray(np.array(rep.as_cpu()[0]))
PIL.Image.fromarray(np.array(pad.as_cpu()[0]))
If you need a color padding, you can, somewhat counterintuitively, use fn.crop
:
rot = fn.rotate(img, angle=90)
crop = fn.crop(rot, crop_pos_x=0.5, crop_pos_y=0.5, crop=(256, 256), out_of_bounds_policy="pad", fill_values=[0x76, 0xb9, 0x00])
The result is:
thanks a lot! This is really helpful!
Also, if you're fine with bilinear resizing without antialiasing, then you can do all those transforms in one go with fn.warp_affine
:
import nvidia.dali as dali
import nvidia.dali.fn as fn
import PIL.Image
import numpy as np
@dali.pipeline_def(batch_size=1, num_threads=4, device_id=0)
def mypipe():
enc, _ = fn.readers.file(file_root=".", files=["alley.png"])
img = fn.decoders.image(enc, device="mixed")
shape = fn.peek_image_shape(enc)
h = shape[0]
w = shape[1]
size = fn.stack(w, h)
scale = dali.math.min(256/w, 256/h)
out_size = fn.cast(scale * size, dtype=dali.types.INT32)
# use negative angle, since here we use source-to-destination matrix
mtx = fn.transforms.rotation(angle=-90, center=size/2)
mtx = fn.transforms.scale(mtx, scale=fn.stack(scale, scale))
mtx = fn.transforms.translation(mtx, offset=(256.0 - out_size) // 2)
warped = fn.warp_affine(img, size=(256, 256), matrix=mtx, fill_value=0, inverse_map=False)
return warped
pipe = mypipe()
pipe.build()
warped, = pipe.run()
The result is:
The aliasing artifacts are quite obvious when you compare this image to the previous ones, but if it's OK for you, then this method will certainly be the most performant one. The added benefit is that you end up with a complete transformation matrix, so if your labels are in fact some points, you can use this matrix to transform them. See this tutorial to learn how to use a transformation matrix to transform keypoints alongside images.
The methods sorted in efficiency order:
Describe the question.
I am now working on a training framework for image rotation(0、90、180、270 degree) recognization task. Since my dataset is so large that it is unavailable to rotate every images tothe above four angles, becasue there is not enough space on the machine to store them. As a result, my approach is to, in the preprocess step, randomly rotate images to one of the above four degrees and change the label accordingly. However, itmakes the time cost of preprocessing be the main part of the total time cost. I want to use DALI to accelerate preprocessing process, and I wonder whether I could random rotate the image and change the label accordingly in the pipeline?
Check for duplicates