Closed cai-linjin closed 4 years ago
I get an idea about calculating resized bounding boxes. Assuming bboxes are in (x, y, w, h)
format, the horizontal resize scale is s1
, and the vertical resize scale is s2
, then the resize box should be (x*s1, y*s2, w*s1, h*s2)
.
Now I can calculate s1
and s2
, both are in a 1x3 tensor whose last element is always 1. However, I don't know how to extract the first 2 elements of the tensor and continue calculating. The code is as follows:
class FaceDatasetPipeline(Pipeline):
def __init__(self, batch_size, num_threads, device_id, dataset_iter):
super(FaceDatasetPipeline, self).__init__(batch_size,
num_threads,
device_id,
seed=12,
exec_async=False,
exec_pipelined=False)
self.datasest = dataset_iter
self.iterator = iter(dataset_iter)
self.iterator.batch_size = batch_size
self.input = ops.ExternalSource()
self.input_bbox = ops.ExternalSource()
self.input_label = ops.ExternalSource()
self.decode = ops.ImageDecoder(device='mixed', output_type=types.RGB)
self.cmnp = ops.CropMirrorNormalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
device='gpu',
output_layout='CHW',
output_dtype=types.FLOAT, )
self.res = ops.Resize(device='gpu',
max_size=1024,
resize_shorter=600)
self.coin1 = ops.CoinFlip(probability=0.5)
self.coin2 = ops.CoinFlip(probability=0.5)
self.flip = ops.Flip(device="gpu", horizontal=0)
self.bbflip = ops.BbFlip(device="cpu", ltrb=False)
self.shape = ops.Shapes(device="gpu")
def define_graph(self):
self.jpegs = self.input()
self.bboxes = self.input_bbox()
self.labels = self.input_label()
images = self.decode(self.jpegs)
shape_raw = self.shape(images) # [H, W, 3], a 1x3 tensor
images = self.res(images)
shape_resized = self.shape(images)
rng1 = self.coin1()
rng2 = self.coin2()
images = self.cmnp(images, mirror=rng1)
images = self.flip(images, vertical=rng2)
bboxes = self.bbflip(self.bboxes, horizontal=rng1, vertical=rng2)
scale = shape_resized / shape_raw
# TODO
# psudo DALI code
bboxes[:, 0, 2] *= scale[0]
bboxes[:, 1, 3] *= scale[1]
return (images, bboxes.gpu(), self.labels.gpu())
def iter_setup(self):
try:
(images, bboxes, labels) = self.iterator.next()
self.feed_input(self.jpegs, images, layout='HWC')
self.feed_input(self.bboxes, bboxes)
self.feed_input(self.labels, labels)
except StopIteration:
self.iterator = iter(self.datasest)
raise StopIteration
Hi,
Currently, it is not possible to extract a part of the tensor. What you can do is to write a custom operator in the native code that would resize your bboxes. A basic guide is available here.
Another remark about your code - calling a shape operator on the GPU data would produce results on the GPU as well - shape_resized
is a GPU tensor.
In our use cases, we didn't need to resize the boxes, because COCOReader
can return them in coordinates relative to the image, rather than absolute. Look for ratio
parameter in the COCOReader
docs.
Is there any reason, why you can not do the same in your solution?
@awolant @JanuszL Hi! Thank you for your help. I am currently using WIDER Face dataset instead of COCO. I reimplemented my dataset class and changed the coordinates from pixel ints to floats (i.e. 0.0-1.0). It works well and resizing bboxes is not necessary now!
@cai-linjin Look at issue #1163 - I've posted a piece of code which extracts single coordinates from a tensor. It's ugly, but it should work for you. There's also a trick with combining tensors/matrices which might be also useful for you. Assuming you already have 1-element tensors with your s1 and s2:
scale = types.Constant(np.array([1, 0, 1, 0], dtype=np.float32)) * s1 + types.Constant(np.array([0, 1, 0, 1], dtype=np.float32)) * s2
@mzient Thank you for your advice, mzient! I haven't figured out a way to calculate s1
and s2
, which should be 1-element tensors. The scale
in my code snippet is a 3-element tensor. I tried to mask scale
to get a one-length tensor. The masked tensor can be engaged in calculation, however, when I tried to return the result, an error (Assert on "*out_shape == *shapes[i]" failed
)occurs. It seems that DALI does all the calculation even when masks asserted, but only filtered out invalid result according to masks.
@cai-linjin I hit a wall as DALI can only broadcast scalars and cannot (yet) concatenate tensors. I was able to roll an absolutely hideous hack that does your job. Here it is - with the caveat that it requires some bug fixes that are in latest master, but they were merged today (Apr 28th) and are not in nightly build yet. If you build DALI from source, you can use it, otherwise you have to wait for a nightly build.
import nvidia.dali.fn as fn
import nvidia.dali as dali
import nvidia.dali.types as types
import numpy as np
def resize_boxes(boxes, source_shape, target_shape):
source_shape = fn.cast(source_shape, dtype=types.FLOAT)
one = types.Constant([1.0])
zero = types.Constant([0.0])
def size_slice(in_tensor, anchor, size):
return fn.slice(in_tensor, anchor, size, axes=[0], normalized_anchor = False, normalized_shape = False)
xmtx = types.Constant(np.array([[1, 0, 1, 0]], dtype=np.float32))
ymtx = types.Constant(np.array([[0, 1, 0, 1]], dtype=np.float32))
widths = size_slice(source_shape, one, one)
heights = size_slice(source_shape, zero, one)
if isinstance(target_shape, dali.pipeline.DataNode):
target_shape = fn.cast(target_shape, dtype=types.FLOAT)
target_widths = size_slice(target_shape, one, one)
target_heights = size_slice(target_shape, zero, one)
else:
target_widths = target_shape[1]
target_heights = target_shape[0]
xscale = target_widths / widths
yscale = target_heights / heights
mat = xscale * xmtx + yscale * ymtx # this is a matrix for one box
mat = fn.reshape(mat, shape=[-1, 4, 1], layout = "HWC")
# this is an ugly hack, because DALI can't broadcast yet...
mat = fn.warp_affine(mat, interp_type=types.INTERP_NN, matrix = [1,0,0,0,1,0], size = fn.shapes(boxes))
mat = fn.reshape(mat, shape=[-1, 4])
return boxes * mat
def get_boxes():
return [
np.array([ # image 1, 2 boxes
[0, 0, 320, 240],
[320, 240, 640, 480]
], dtype=np.float32),
np.array([ # image 2, 3 boxes
[0,0,200,200],
[320,180,960,540],
[0, 0, 160, 90]
], dtype=np.float32)
]
def get_images():
return [
np.ndarray(shape=[480,640,3], dtype=np.uint8),
np.ndarray(shape=[720,1280,1], dtype=np.uint8),
]
class ExamplePipeline(dali.pipeline.Pipeline):
def define_graph(self):
boxes = fn.external_source(get_boxes)
images = fn.external_source(get_images)
out_shapes = types.Constant([300,300])
in_shapes = fn.shapes(images)
resized = resize_boxes(boxes, in_shapes, out_shapes)
return boxes, resized
pipe = ExamplePipeline(batch_size=2, device_id=0, num_threads=2)
pipe.build()
o = pipe.run()
print("image 1")
print("in\n",o[0].at(0))
print("out\n", o[1].at(0))
print("image 2")
print("in\n", o[0].at(1))
print("out\n", o[1].at(1))
As I've said - this is an ugly hack; proper solution would call for proper operand broadcasting/tiling in Arithmetic Operators; when we have them it will be considerably simpler.
@JanuszL Very amazing and inspiring! I'll try it! Thank you very much!
Thank you for your great work! I am exploring DALI in my personal ML projects recently, and DALI is really handy and amazing!
DALI provides a
Resize
operator, which can resize images. But what about the bounding boxes? Do I have to implement my custom operator to achieve bounding box resizing? I am considering usingps.PythonFunction
operator, but I have to set the pipeline'sexec_async
andexec_pipelined
parameter toFalse
, which decreases the efficiency.Are there any better ways?
Than you for helping me!