NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.13k stars 619 forks source link

Specify ROI coordinates for each sample individually #5693

Open huberb opened 2 days ago

huberb commented 2 days ago

Describe the question.

I'm trying to get crops from my image samples based on a mapping of filename to crop coordinates. Currently I'm using fn.readers.numpy:

        data = fn.readers.numpy(
            device="cpu", file_root=dataset_path, file_filter="*.npy",
            roi_start=(0, 0, 0, 0), roi_end=(10, 28, 28, 28),
            prefetch_queue_depth=32
        )

Is there any way to specify the roi_start and roi_end values for each sample individually? I could not find any example for hint in the documentation on how to do this

Check for duplicates

jantonguirao commented 2 days ago

Hi @huberb.

Yes, you can assign per-sample ROI. Here's an example that uses ROI coordinates that are also read from numpy files:

a = np.array([1,2,3,4,5,6])
np.save('a.npy', a)
b = np.array([1,2,3,4,5,6,7,8,9])
np.save('b.npy', b)
a_start = np.array([2], dtype=np.int32)
np.save('a_start.npy', a_start)
a_end = np.array([3], dtype=np.int32)
np.save('a_end.npy', a_end)
b_start = np.array([1], dtype=np.int32)
np.save('b_start.npy', b_start)
b_end = np.array([3], dtype=np.int32)
np.save('b_end.npy', b_end)

@pipeline_def(batch_size=2, device_id=0, num_threads=3)
def pipe():
    start = fn.readers.numpy(
        device='cpu',
        file_root='./',
        files=['a_start.npy', 'b_start.npy']
    )
    end = fn.readers.numpy(
        device='cpu',
        file_root='./',
        files=['a_end.npy', 'b_end.npy']
    )
    data1 = fn.readers.numpy(
        device='cpu',
        file_root='./',
        files=['a.npy', 'b.npy'],
        roi_start=start,
        roi_end=end,
    )
    return start, end, data1

p = pipe()
p.build()
out = p.run()

print('start', np.array(out[0][0]), np.array(out[0][1]))
print('end', np.array(out[1][0]), np.array(out[1][1]))
print('data', np.array(out[2][0]), np.array(out[2][1]))

You can use other data nodes from your pipeline instead

huberb commented 2 days ago

Thank you for your response! Sorry for being a bit slow here, but I have to admit I'm still a bit confused about the pipe API. Assuming I have thousands of files like this, how do I make sure that the correct cropping coordinates are loaded together with the corresponding image? Do I just need to make sure that the order of files that are given to the files parameter match or all readers?

Would it also be possible to load all my coordinates beforehand as a python dictionary and maybe use the external source api? Let's say I have a dict like this:

coordinates = {
  'file1.npy': [10, 20, 10, 20],
  'file2.npy': [0, 10, 20, 30]
}

This way I could probably avoid opening a lot of extra file handles.

Sadly my usecase is also a bit more complicated than that. I actually have to crop at different positions for every color channel of a single image.

So my coordinates actually look like this:

# different coordinate for every color channel
coords = [
  [10, 20, 10, 20],
  [20, 30, 30, 40],
  [15, 25, 15, 15]
]
image = ... # load image
print(image.shape)  # [3, 100, 100]
crops = []
for channel_idx, channel_coords in enumerate(coords):
  crop = image[
    channel_idx,
    channel_coords[0]:channel_coords[1],
    channel_coords[2]:channel_coords[3]
  ]
  crops.append(crop)

print(crops.shape)  # [3, 10, 10]
mzient commented 1 day ago

Hello @huberb

Do I just need to make sure that the order of files that are given to the files parameter match or all readers?

Yes. Multiple readers have files specified in the same order, then they will be synchronized - even if you use random shuffling, specifying the same seed will make them shuffled exactly the same way.

Would it also be possible to load all my coordinates beforehand as a python dictionary and maybe use the external source api? Let's say I have a dict like this:

You can use external source, but not with a dictionary - just use a list with the same order as the one in which the files are specified. If you want to have different shuffling on each run, you can either just feed everything through external_source (this may come with some performance penalty, especially if not configured properly) or use python_function.

BTW - I think the coordinates in your example are incorrect - the rectangles for channels 0 and 2 are empty

NOTE: The code below requires the latest DALI (1.43): The files contain 2D images with 3 channels in channels-first (CHW) format.

import nvidia.dali as dali
import nvidia.dali.fn as fn
import numpy as np

files_rois = {
    "in1.npy": [
        [10, 20, 20, 30],
        [15, 25, 25, 35],
        [5, 15, 15, 25],
    ],
    "in2.npy": [
        [30, 20, 40, 40],
        [35, 25, 45, 45],
        [25, 15, 35, 35],
    ],
}

def get_roi(filename_tensor):
    filename = filename_tensor.tobytes().decode('utf-8')
    return np.array(files_rois[filename])

@dali.pipeline_def(batch_size=4, device_id=0, num_threads=4)
def my_pipe():
    img = fn.readers.numpy(files=list(files_rois.keys()), file_root=".", random_shuffle=True, seed=1)
    roi = fn.python_function(img.source_info(), function=get_roi)

    # fn.crop / fn.slice don't support per-channel ROIs, so we have split the image into channels and then stack it back
    r = img[0, roi[0, 0]:roi[0, 2], roi[0, 1]:roi[0, 3]]
    g = img[1, roi[1, 0]:roi[1, 2], roi[1, 1]:roi[1, 3]]
    b = img[2, roi[2, 0]:roi[2, 2], roi[2, 1]:roi[2, 3]]

    cropped = fn.stack(r, g, b, axis=0)

    return cropped, roi

pipe = my_pipe()
pipe.build()
img, roi = pipe.run()
for i in range(len(img)):
    print(img[i].source_info())
    print(img[i].shape())
    print(roi[i])
    print()

Output:

in1.npy
[3, 10, 10]
TensorCPU(
    [[10 20 20 30]
     [15 25 25 35]
     [ 5 15 15 25]],
    dtype=DALIDataType.INT64,
    shape=[3, 4])

in2.npy
[3, 10, 20]
TensorCPU(
    [[30 20 40 40]
     [35 25 45 45]
     [25 15 35 35]],
    dtype=DALIDataType.INT64,
    shape=[3, 4])

in2.npy
[3, 10, 20]
TensorCPU(
    [[30 20 40 40]
     [35 25 45 45]
     [25 15 35 35]],
    dtype=DALIDataType.INT64,
    shape=[3, 4])

in1.npy
[3, 10, 10]
TensorCPU(
    [[10 20 20 30]
     [15 25 25 35]
     [ 5 15 15 25]],
    dtype=DALIDataType.INT64,
    shape=[3, 4])