NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.06k stars 615 forks source link

Is there any way to resize bounding boxes inside a pipeline? #1904

Closed cai-linjin closed 4 years ago

cai-linjin commented 4 years ago

Thank you for your great work! I am exploring DALI in my personal ML projects recently, and DALI is really handy and amazing!

DALI provides a Resize operator, which can resize images. But what about the bounding boxes? Do I have to implement my custom operator to achieve bounding box resizing? I am considering using ps.PythonFunction operator, but I have to set the pipeline's exec_async and exec_pipelined parameter to False, which decreases the efficiency.

Are there any better ways?

Than you for helping me!

cai-linjin commented 4 years ago

I get an idea about calculating resized bounding boxes. Assuming bboxes are in (x, y, w, h) format, the horizontal resize scale is s1, and the vertical resize scale is s2, then the resize box should be (x*s1, y*s2, w*s1, h*s2).

Now I can calculate s1 and s2, both are in a 1x3 tensor whose last element is always 1. However, I don't know how to extract the first 2 elements of the tensor and continue calculating. The code is as follows: class FaceDatasetPipeline(Pipeline):

    def __init__(self, batch_size, num_threads, device_id, dataset_iter):
        super(FaceDatasetPipeline, self).__init__(batch_size,
                                              num_threads,
                                              device_id,
                                              seed=12,
                                              exec_async=False,    
                                              exec_pipelined=False)

    self.datasest = dataset_iter
    self.iterator = iter(dataset_iter)
    self.iterator.batch_size = batch_size

    self.input = ops.ExternalSource()
    self.input_bbox = ops.ExternalSource()
    self.input_label = ops.ExternalSource()
    self.decode = ops.ImageDecoder(device='mixed', output_type=types.RGB)
    self.cmnp = ops.CropMirrorNormalize(mean=[0.485, 0.456, 0.406],
                                        std=[0.229, 0.224, 0.225],
                                        device='gpu',
                                        output_layout='CHW',
                                        output_dtype=types.FLOAT, )
    self.res = ops.Resize(device='gpu',
                          max_size=1024,
                          resize_shorter=600)
    self.coin1 = ops.CoinFlip(probability=0.5)
    self.coin2 = ops.CoinFlip(probability=0.5)

    self.flip = ops.Flip(device="gpu", horizontal=0)
    self.bbflip = ops.BbFlip(device="cpu", ltrb=False)
    self.shape = ops.Shapes(device="gpu")

def define_graph(self):
    self.jpegs = self.input()
    self.bboxes = self.input_bbox()
    self.labels = self.input_label()

    images = self.decode(self.jpegs)
    shape_raw = self.shape(images)   # [H, W, 3], a 1x3 tensor
    images = self.res(images)
    shape_resized = self.shape(images)

    rng1 = self.coin1()
    rng2 = self.coin2()
    images = self.cmnp(images, mirror=rng1)
    images = self.flip(images, vertical=rng2)
    bboxes = self.bbflip(self.bboxes, horizontal=rng1, vertical=rng2)

    scale =  shape_resized / shape_raw
    # TODO
    # psudo DALI code
    bboxes[:, 0, 2] *= scale[0]
    bboxes[:, 1, 3] *= scale[1]

    return (images, bboxes.gpu(), self.labels.gpu())

def iter_setup(self):
    try:
        (images, bboxes, labels) = self.iterator.next()
        self.feed_input(self.jpegs, images, layout='HWC')
        self.feed_input(self.bboxes, bboxes)
        self.feed_input(self.labels, labels)
    except StopIteration:
        self.iterator = iter(self.datasest)
        raise StopIteration
JanuszL commented 4 years ago

Hi, Currently, it is not possible to extract a part of the tensor. What you can do is to write a custom operator in the native code that would resize your bboxes. A basic guide is available here. Another remark about your code - calling a shape operator on the GPU data would produce results on the GPU as well - shape_resized is a GPU tensor.

awolant commented 4 years ago

In our use cases, we didn't need to resize the boxes, because COCOReader can return them in coordinates relative to the image, rather than absolute. Look for ratio parameter in the COCOReader docs. Is there any reason, why you can not do the same in your solution?

cai-linjin commented 4 years ago

@awolant @JanuszL Hi! Thank you for your help. I am currently using WIDER Face dataset instead of COCO. I reimplemented my dataset class and changed the coordinates from pixel ints to floats (i.e. 0.0-1.0). It works well and resizing bboxes is not necessary now!

mzient commented 4 years ago

@cai-linjin Look at issue #1163 - I've posted a piece of code which extracts single coordinates from a tensor. It's ugly, but it should work for you. There's also a trick with combining tensors/matrices which might be also useful for you. Assuming you already have 1-element tensors with your s1 and s2:

scale = types.Constant(np.array([1, 0, 1, 0], dtype=np.float32)) * s1 + types.Constant(np.array([0, 1, 0, 1], dtype=np.float32)) * s2
cai-linjin commented 4 years ago

@mzient Thank you for your advice, mzient! I haven't figured out a way to calculate s1 and s2, which should be 1-element tensors. The scale in my code snippet is a 3-element tensor. I tried to mask scale to get a one-length tensor. The masked tensor can be engaged in calculation, however, when I tried to return the result, an error (Assert on "*out_shape == *shapes[i]" failed)occurs. It seems that DALI does all the calculation even when masks asserted, but only filtered out invalid result according to masks.

mzient commented 4 years ago

@cai-linjin I hit a wall as DALI can only broadcast scalars and cannot (yet) concatenate tensors. I was able to roll an absolutely hideous hack that does your job. Here it is - with the caveat that it requires some bug fixes that are in latest master, but they were merged today (Apr 28th) and are not in nightly build yet. If you build DALI from source, you can use it, otherwise you have to wait for a nightly build.

import nvidia.dali.fn as fn
import nvidia.dali as dali
import nvidia.dali.types as types
import numpy as np

def resize_boxes(boxes, source_shape, target_shape):
    source_shape = fn.cast(source_shape, dtype=types.FLOAT)
    one = types.Constant([1.0])
    zero = types.Constant([0.0])
    def size_slice(in_tensor, anchor, size):
        return fn.slice(in_tensor, anchor, size, axes=[0], normalized_anchor = False, normalized_shape = False)
    xmtx = types.Constant(np.array([[1, 0, 1, 0]], dtype=np.float32))
    ymtx = types.Constant(np.array([[0, 1, 0, 1]], dtype=np.float32))
    widths = size_slice(source_shape, one, one)
    heights = size_slice(source_shape, zero, one)
    if isinstance(target_shape, dali.pipeline.DataNode):
        target_shape = fn.cast(target_shape, dtype=types.FLOAT)
        target_widths = size_slice(target_shape, one, one)
        target_heights = size_slice(target_shape, zero, one)
    else:
        target_widths = target_shape[1]
        target_heights = target_shape[0]
    xscale = target_widths / widths
    yscale = target_heights / heights
    mat = xscale * xmtx + yscale * ymtx  # this is a matrix for one box

    mat = fn.reshape(mat, shape=[-1, 4, 1], layout = "HWC")

    # this is an ugly hack, because DALI can't broadcast yet...
    mat = fn.warp_affine(mat, interp_type=types.INTERP_NN, matrix = [1,0,0,0,1,0], size = fn.shapes(boxes))

    mat = fn.reshape(mat, shape=[-1, 4])

    return boxes * mat

def get_boxes():
    return [
        np.array([ # image 1, 2 boxes
            [0, 0, 320, 240],
            [320, 240, 640, 480]
        ], dtype=np.float32),
        np.array([ # image 2, 3 boxes
            [0,0,200,200],
            [320,180,960,540],
            [0, 0, 160, 90]
        ], dtype=np.float32)
    ]

def get_images():
    return [
        np.ndarray(shape=[480,640,3], dtype=np.uint8),
        np.ndarray(shape=[720,1280,1], dtype=np.uint8),
    ]

class ExamplePipeline(dali.pipeline.Pipeline):
    def define_graph(self):
        boxes = fn.external_source(get_boxes)
        images = fn.external_source(get_images)
        out_shapes = types.Constant([300,300])
        in_shapes = fn.shapes(images)
        resized = resize_boxes(boxes, in_shapes, out_shapes)
        return boxes, resized

pipe = ExamplePipeline(batch_size=2, device_id=0, num_threads=2)
pipe.build()
o = pipe.run()
print("image 1")
print("in\n",o[0].at(0))
print("out\n", o[1].at(0))
print("image 2")
print("in\n", o[0].at(1))
print("out\n", o[1].at(1))

As I've said - this is an ugly hack; proper solution would call for proper operand broadcasting/tiling in Arithmetic Operators; when we have them it will be considerably simpler.

cai-linjin commented 4 years ago

@JanuszL Very amazing and inspiring! I'll try it! Thank you very much!