Specify ROI coordinates for each sample individually

Describe the question.

I'm trying to get crops from my image samples based on a mapping of filename to crop coordinates. Currently I'm using fn.readers.numpy:

        data = fn.readers.numpy(
            device="cpu", file_root=dataset_path, file_filter="*.npy",
            roi_start=(0, 0, 0, 0), roi_end=(10, 28, 28, 28),
            prefetch_queue_depth=32
        )

Is there any way to specify the roi_start and roi_end values for each sample individually? I could not find any example for hint in the documentation on how to do this

Check for duplicates

[x] I have searched the open bugs/issues and have found no duplicates for this bug report

Hi @huberb.

Yes, you can assign per-sample ROI. Here's an example that uses ROI coordinates that are also read from numpy files:

a = np.array([1,2,3,4,5,6])
np.save('a.npy', a)
b = np.array([1,2,3,4,5,6,7,8,9])
np.save('b.npy', b)
a_start = np.array([2], dtype=np.int32)
np.save('a_start.npy', a_start)
a_end = np.array([3], dtype=np.int32)
np.save('a_end.npy', a_end)
b_start = np.array([1], dtype=np.int32)
np.save('b_start.npy', b_start)
b_end = np.array([3], dtype=np.int32)
np.save('b_end.npy', b_end)

@pipeline_def(batch_size=2, device_id=0, num_threads=3)
def pipe():
    start = fn.readers.numpy(
        device='cpu',
        file_root='./',
        files=['a_start.npy', 'b_start.npy']
    )
    end = fn.readers.numpy(
        device='cpu',
        file_root='./',
        files=['a_end.npy', 'b_end.npy']
    )
    data1 = fn.readers.numpy(
        device='cpu',
        file_root='./',
        files=['a.npy', 'b.npy'],
        roi_start=start,
        roi_end=end,
    )
    return start, end, data1

p = pipe()
p.build()
out = p.run()

print('start', np.array(out[0][0]), np.array(out[0][1]))
print('end', np.array(out[1][0]), np.array(out[1][1]))
print('data', np.array(out[2][0]), np.array(out[2][1]))

You can use other data nodes from your pipeline instead

Thank you for your response! Sorry for being a bit slow here, but I have to admit I'm still a bit confused about the pipe API. Assuming I have thousands of files like this, how do I make sure that the correct cropping coordinates are loaded together with the corresponding image? Do I just need to make sure that the order of files that are given to the files parameter match or all readers?

Would it also be possible to load all my coordinates beforehand as a python dictionary and maybe use the external source api? Let's say I have a dict like this:

coordinates = {
  'file1.npy': [10, 20, 10, 20],
  'file2.npy': [0, 10, 20, 30]
}

This way I could probably avoid opening a lot of extra file handles.

Sadly my usecase is also a bit more complicated than that. I actually have to crop at different positions for every color channel of a single image.

So my coordinates actually look like this:

# different coordinate for every color channel
coords = [
  [10, 20, 10, 20],
  [20, 30, 30, 40],
  [15, 25, 15, 15]
]
image = ... # load image
print(image.shape)  # [3, 100, 100]
crops = []
for channel_idx, channel_coords in enumerate(coords):
  crop = image[
    channel_idx,
    channel_coords[0]:channel_coords[1],
    channel_coords[2]:channel_coords[3]
  ]
  crops.append(crop)

print(crops.shape)  # [3, 10, 10]

Hello @huberb

Do I just need to make sure that the order of files that are given to the files parameter match or all readers?

Yes. Multiple readers have files specified in the same order, then they will be synchronized - even if you use random shuffling, specifying the same seed will make them shuffled exactly the same way.

Would it also be possible to load all my coordinates beforehand as a python dictionary and maybe use the external source api? Let's say I have a dict like this:

You can use external source, but not with a dictionary - just use a list with the same order as the one in which the files are specified. If you want to have different shuffling on each run, you can either just feed everything through external_source (this may come with some performance penalty, especially if not configured properly) or use python_function.

BTW - I think the coordinates in your example are incorrect - the rectangles for channels 0 and 2 are empty

NOTE: The code below requires the latest DALI (1.43): The files contain 2D images with 3 channels in channels-first (CHW) format.

import nvidia.dali as dali
import nvidia.dali.fn as fn
import numpy as np

files_rois = {
    "in1.npy": [
        [10, 20, 20, 30],
        [15, 25, 25, 35],
        [5, 15, 15, 25],
    ],
    "in2.npy": [
        [30, 20, 40, 40],
        [35, 25, 45, 45],
        [25, 15, 35, 35],
    ],
}

def get_roi(filename_tensor):
    filename = filename_tensor.tobytes().decode('utf-8')
    return np.array(files_rois[filename])

@dali.pipeline_def(batch_size=4, device_id=0, num_threads=4)
def my_pipe():
    img = fn.readers.numpy(files=list(files_rois.keys()), file_root=".", random_shuffle=True, seed=1)
    roi = fn.python_function(img.source_info(), function=get_roi)

    # fn.crop / fn.slice don't support per-channel ROIs, so we have split the image into channels and then stack it back
    r = img[0, roi[0, 0]:roi[0, 2], roi[0, 1]:roi[0, 3]]
    g = img[1, roi[1, 0]:roi[1, 2], roi[1, 1]:roi[1, 3]]
    b = img[2, roi[2, 0]:roi[2, 2], roi[2, 1]:roi[2, 3]]

    cropped = fn.stack(r, g, b, axis=0)

    return cropped, roi

pipe = my_pipe()
pipe.build()
img, roi = pipe.run()
for i in range(len(img)):
    print(img[i].source_info())
    print(img[i].shape())
    print(roi[i])
    print()

Output:

in1.npy
[3, 10, 10]
TensorCPU(
    [[10 20 20 30]
     [15 25 25 35]
     [ 5 15 15 25]],
    dtype=DALIDataType.INT64,
    shape=[3, 4])

in2.npy
[3, 10, 20]
TensorCPU(
    [[30 20 40 40]
     [35 25 45 45]
     [25 15 35 35]],
    dtype=DALIDataType.INT64,
    shape=[3, 4])

in2.npy
[3, 10, 20]
TensorCPU(
    [[30 20 40 40]
     [35 25 45 45]
     [25 15 35 35]],
    dtype=DALIDataType.INT64,
    shape=[3, 4])

in1.npy
[3, 10, 10]
TensorCPU(
    [[10 20 20 30]
     [15 25 25 35]
     [ 5 15 15 25]],
    dtype=DALIDataType.INT64,
    shape=[3, 4])

NVIDIA / DALI

Specify ROI coordinates for each sample individually #5693

Describe the question.

Check for duplicates