NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
4.98k stars 609 forks source link

Image Segmentation mask aliasing #5507

Open YJonmo opened 3 weeks ago

YJonmo commented 3 weeks ago

Describe the question.

Thanks for this work.

I have a pipeline for training image segmentation modes. I am using albumentations library for data augmentations. Now it is time for me to try DALI to get a speed boost.

I need to load the images and corresponding masks and train a model using augmented images and masks. However, I noticed when I Ioad the masks using the following lines the aliasing happens and new pixel values appear in the mask. In other words, the pixel value in the mask frame should be only a certain number such as 0, 25, 100, 220. But after operations such as fn.resize or fn.random_resized_crop values in between them appear.

    image_files, _= fn.readers.file(file_root=str(images_dir), file_filters='*.png', seed=1234, name="main_reader")#, num_shards=world_size, shard_id=global_rank)
    mask_files, _= fn.readers.file(file_root=str(mask_dir), seed=1234)

    images = fn.random_resized_crop(images, size = (512,512), random_area=[0.08, 1.0], 
                                    random_aspect_ratio=[0.75, 1.333333],antialias=True)

Before resize: image

After resize: image

So how can I avoid this?

Check for duplicates

JanuszL commented 3 weeks ago

Hi @florischabert,

Thank you for reaching out. In the case of the mentioned random_resized_crop operator, the default interpolation type is linear, please use interp_type parameter and set it to INTERP_NN.

YJonmo commented 3 weeks ago

Amazing, thank you.

One question that is irrelevant to this topic is how could I do this using DALI: The pixels containing values 25 should become 2, and 200 should become 1. The numpy equivalent for a greyscale image is:

images[images==25] = 2
images[images==200]= 1

Do I need to create a custom function and call it in the pipeline as suggested by GPT?

JanuszL commented 3 weeks ago

Hi @YJonmo,

I think you can try out the lookup_table operator:

    images= fn.lookup_table(
        images,
        keys=[0, 25, 100, 220],
        values=[0, 2, 100, 1],
    )