NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.13k stars 619 forks source link

How to use Resize after CropMirrorNormalize? #1163

Closed ptirupat closed 4 years ago

ptirupat commented 5 years ago

I am trying to write a simple pipeline as shown below. I want to resize my crop and normalize it as well.

` class SimplePipeline(Pipeline):

    def __init__(self, frames_dir, frames_file, batch_size, crop_x, crop_y, crop_size, crop_resize, num_threads, device_id):
        super(SimplePipeline, self).__init__(batch_size, num_threads, device_id, seed = 12)
        self.input = ops.FileReader(file_root = frames_dir, file_list = frames_file)
        self.decode = ops.ImageDecoder(device = 'mixed', output_type = types.RGB)
        self.crop_normalize = ops.CropMirrorNormalize(device='gpu', crop=crop_size, crop_pos_x=crop_x, crop_pos_y=crop_y, mean=0, std=255)
        self.resize = ops.Resize(device='gpu', resize_x=crop_resize, resize_y=crop_resize)

    def define_graph(self):
        jpegs, labels = self.input()
        images = self.decode(jpegs)
        crops = self.crop_normalize(images)
        resized_crops = self.resize(crops)
        return (resized_crops, labels)

` With this approach I get the following error

RuntimeError: Critical error in pipeline: [/opt/dali/dali/pipeline/operators/resize/resize.cu:43] Assert on "IsType(input.type())" failed: Expected input data as uint8.

Is there some way I can achieve crop, normalize and rescale operation using DALI?

JanuszL commented 5 years ago

Hi, By default CropMirrorNormalize outputs floats, while resize expects uint8 as the input. What you can do is to resize first then normalize, or change output_dtype in normalize to uint8.

makslevental commented 5 years ago

@JanuszL can you say a little bit about why resize (and colorspaceconversion) requires uint8? and correct me if i'm wrong but aren't we therefore losing precision upon doing those operations?

JanuszL commented 5 years ago

@mzient ?

mzient commented 5 years ago

Currently, resize only works on uint8 images, but the underlying code can accept and produce any type - with the caveat that in its current state it will not rescale the data to fit the new dynamic range, so resizing uint8 to produce uint16 will still yield intensities in 0-255 range. There's an ongoing effort to extend Resize to 3D data, mostly with medical imaging in mind - at the same time, we'll add support for other data types, since data from medical scans is often in higher bit depths.

ryanstout commented 4 years ago

@mzient any ETA on higher bit depth support in Dali? I've got data that needs to be at least 16-bit uint or float32 the whole way through. Thanks!

mzient commented 4 years ago

@ryanstout Proper Resize is not scheduled now. However, if you don't need fancy anti-aliasing (you're fine with bi-linear), you can use WarpAffine - it handles int16 and float.

import nvidia.dali as dali
import nvidia.dali.fn as fn
import nvidia.dali.types as types

def warp_resize(images, target_size):
    shapes = fn.cast(fn.shapes(images), dtype=types.FLOAT)
    one = types.Constant([1.0])
    zero = types.Constant([0.0])
    def size_slice(in_tensor, anchor, size):
        return fn.slice(in_tensor, anchor, size, axes=[0], normalized_anchor = False, normalized_shape = False)
    shapes = fn.cast(fn.shapes(images), dtype=types.FLOAT)
    xmtx = types.Constant(np.array([[1, 0, 0], [0, 0, 0]], dtype=np.float32))
    ymtx = types.Constant(np.array([[0, 0, 0], [0, 1, 0]], dtype=np.float32))
    widths = size_slice(shapes, one, one)
    heights = size_slice(shapes, zero, one)

    if isinstance(target_size, dali.pipeline.DataNode):
        target_widths = size_slice(target_size, one, one)
        target_heights = size_slice(target_size, zero, one)
    else:
        target_widths = target_size[1]
        target_heights = target_size[0]
    xscale = widths / target_widths
    yscale = heights / target_heights
    mat = xscale * xmtx + yscale * ymtx
    warped = fn.warp_affine(images, mat, size = target_size)
    return warped

This piece of code can be used to achiver resize-like behavior with WarpAffine operator. It doens't work correctly when target_size is a DALI data node - but it's due to a bug and will be fixed later this week.

Usage (in your Pipeline):

    def define_graph(self):
        jpegs, labels = fn.caffe_reader(path = db_folder, random_shuffle = True, seed = 12)
        images = fn.image_decoder(jpegs, device = "mixed");
        shape = (480, 640)  # height, width
        resized = warp_resize(images, shape)
        return labels, images, resized