NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.07k stars 615 forks source link

Applying decoders and augmentation to images with float pixel values #4014

Closed shrutishrestha closed 2 years ago

shrutishrestha commented 2 years ago

I am using the astronomy data having magnetic filed of unit Gauss as its pixel values. These values are in fits extension and looks like values shown in the picture. It contains float pixel values like -2.5, 4.9 etc. I saw there are only 5 decoders

  1. decoder.audio
  2. decoder.image
  3. decoder.image_crop
  4. decoder.image_random_crop
  5. decoder.image_slice

These all need and uint datatype. So how do I use decoder my data which is in float?

Screen Shot 2022-06-27 at 11 38 04 AM
JanuszL commented 2 years ago

Hi @shrutishrestha,

DALI image decoders don't support floating point outputs. However, you can use the external source operator and execute any python decoder you like. Thanks to the parallel option it should provide good performance.

mzient commented 2 years ago

@shrutishrestha There's also a Numpy file reader - so if your files are in .npy format, you can read them directly.

shrutishrestha commented 2 years ago

@JanuszL and @mzient thank you for the response. Yes I am using the numpy reader and external source operator, so after reading my npy files through numpyreader, how can I do augmentations using DALI, are there any specific libraries like decoder.image_crop or I have to make my own?

JanuszL commented 2 years ago

@shrutishrestha - if you use the external source operator or the numpy reader you don't need to decode your data any further. You have raw images already in the memory and you can use any DALI operator (slice or crop to name a few) directly.

shrutishrestha commented 2 years ago

Hi @JanuszL, So my image is 1 channel .npy image and its dimension is (512,512) Now, I am doing augmentations

@pipeline_def
def custom_pipeline(files, highresolution, lowresolution, hmidata):
    high = fn.readers.numpy(device='gpu', file_root=highresolution, pad_last_batch=True, files =files, name="my_reader")
    high = fn.flip(high, vertical=1, horizontal=1)

    low = fn.readers.numpy(device='gpu', file_root=lowresolution, files =files, pad_last_batch=True)
    low = fn.flip(low, vertical=1, horizontal=1)

    hmi = fn.readers.numpy(device='gpu', file_root=hmidata, files =files, pad_last_batch=True)
    hmi = fn.flip(hmi, vertical=1, horizontal=1)

    return (hmi, low, high)

So the augmentation error says "[/opt/dali/dali/pipeline/operator/op_schema.h:447] The number of dimensions 2 does not match any of the allowed layouts for input 0. Valid layouts are:FDHWC, FHWC,DHWC, HWC,FCDHW,FCHW,CDHW,CHW"

Should I change the data source and make my numpy arrays in HWC (512, 512,1) by using numpy.expand_dims ? or is there a way to change the dimension after fn.readers.numpy and before fn.flip lines. I think I will have to again revert the dimensions to (512,512) i.e HW before the return (hmi, low, high) statement as the ExternalInputGpuIterator is working fine with (512,512) dimension and also the enumerate(dali_train_iter) below gives the appropriate dimension of (8,1,512,512) where 8 is batch size, 1 is channel, 512,512 are H and W.

dali_train_iter = ExternalInputGpuIterator(pipe_train_gpu, batch_size=args["train_batch_size"], last_batch_policy=LastBatchPolicy.PARTIAL, auto_reset=True, files= trainpathlist)

def train(dali_train_iter):
    for i, batch in enumerate(dali_train_iter):
            hmidata, image, label = batch 
            hmidata = hmidata.to(torch.float32)
            image = image.to(torch.float32)
            label = label.to(torch.float32)
JanuszL commented 2 years ago

Hi @shrutishrestha,

I think you can try out the reshape operator - high = fn.reshape(high, rel_shape=[1, 1, -1], layout="HWC") - it operates only on the metadata so it should have a negligible overhead.

shrutishrestha commented 2 years ago

Hi @JanuszL , Thank you for your response. I used the code you suggested above and it worked. My code is working fine with num_thread =1 passed in custom pipeline.

 pipe_train_gpu = custom_pipeline(batch_size=args["train_batch_size"], device_id=args["device_id"],num_threads=args["num_threads"], files= trainpathlist, set_affinity=args["set_affinity"], highresolution = args["highresolution_partition_path"], lowresolution= args["lowresolution_partition_path"], hmidata = args["hmidata_partition_path"])

    dali_train_iter = ExternalInputGpuIterator(pipe_train_gpu, batch_size=args["train_batch_size"], last_batch_policy=LastBatchPolicy.PARTIAL,  files= trainpathlist)

But when I increase the num_thread value, I get this error.

Traceback (most recent call last): File "main_dalinr.py", line 261, in main(args) File "main_dalinr.py", line 140, in main data_loader = dali_train_iter, File "/scratch/sshrestha8/nsight/nsightdemo/arctic_run_folders/train_val_test/train.py", line 20, in train_func for i, batch in enumerate(data_loader): File "/scratch/sshrestha8/nsight/nsightdemo/arctic_run_folders/dataloader/dali_numpy_reader.py", line 49, in next out = super().next() File "/userapp/virtualenv/SR_ENV/venv/lib/python3.7/site-packages/nvidia/dali/plugin/pytorch.py", line 185, in next outputs = self._get_outputs() File "/userapp/virtualenv/SR_ENV/venv/lib/python3.7/site-packages/nvidia/dali/plugin/base_iterator.py", line 257, in _get_outputs outputs.append(p.share_outputs()) File "/userapp/virtualenv/SR_ENV/venv/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 926, in share_outputs return self._pipe.ShareOutputs() RuntimeError: Critical error in pipeline: Error when executing GPU operator readersNumpy, instance name: "Numpy_3", encountered: Error in thread 1: [/opt/dali/dali/operators/reader/loader/numpy_loader_gpu.cc:103] [/opt/dali/dali/util/stdcufile.cc:101] Assert on "pos >= 0 && pos <= (int64)length" failed: Invalid seek Stacktrace (8 entries): [frame 0]: /userapp/virtualenv/SR_ENV/venv/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x83f5f) [0x7f64055ecf5f] [frame 1]: /userapp/virtualenv/SR_ENV/venv/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x1b514e) [0x7f640571e14e] [frame 2]: /userapp/virtualenv/SR_ENV/venv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x2dd2b14) [0x7f63e806bb14] [frame 3]: /userapp/virtualenv/SR_ENV/venv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x2ddab88) [0x7f63e8073b88] [frame 4]: /userapp/virtualenv/SR_ENV/venv/lib/python3.7/site-packages/nvidia/dali/libdali.so(dali::ThreadPool::ThreadMain(int, int, bool)+0x1fe) [0x7f64056c69de] [frame 5]: /userapp/virtualenv/SR_ENV/venv/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x721d9f) [0x7f6405c8ad9f] [frame 6]: /lib64/libpthread.so.0(+0x7ea5) [0x7f64dcdd3ea5] [frame 7]: /lib64/libc.so.6(clone+0x6d) [0x7f64dc3f3b0d] . File: hmi.sharp_cea_720s.5963.20150916_060000_TAI.magnetogram.npy Stacktrace (6 entries): [frame 0]: /userapp/virtualenv/SR_ENV/venv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x5974a2) [0x7f63e58304a2] [frame 1]: /userapp/virtualenv/SR_ENV/venv/lib/python3.7/site-packages/nvidia/dali/libdali_operators.so(+0x2ddade8) [0x7f63e8073de8] [frame 2]: /userapp/virtualenv/SR_ENV/venv/lib/python3.7/site-packages/nvidia/dali/libdali.so(dali::ThreadPool::ThreadMain(int, int, bool)+0x1fe) [0x7f64056c69de] [frame 3]: /userapp/virtualenv/SR_ENV/venv/lib/python3.7/site-packages/nvidia/dali/libdali.so(+0x721d9f) [0x7f6405c8ad9f] [frame 4]: /lib64/libpthread.so.0(+0x7ea5) [0x7f64dcdd3ea5] [frame 5]: /lib64/libc.so.6(clone+0x6d) [0x7f64dc3f3b0d]

Current pipeline object is no longer valid.

My ExternalInputGpuIterator is:

class ExternalInputGpuIterator(DALIGenericIterator):
    def __init__(self, pipelines, batch_size, files, last_batch_policy=LastBatchPolicy.PARTIAL, auto_reset=True):
        super().__init__(pipelines=pipelines, last_batch_policy=last_batch_policy, auto_reset=auto_reset, output_map=['hmidata', 'input_image', 'target_image'], reader_name="my_reader")
        self.files = files
        self.batch_size = batch_size
        self.data_set_len = len(self.files)
        self.n = self.data_set_len

    def __iter__(self):
        self.i = 0
        # shuffle(self.files)
        return self

    def __len__(self):
        return self.data_set_len

    def __next__(self):
        if self.i >= self.n:
            self.__iter__()
            raise StopIteration

        else:
            out = super().__next__()
            hmidatalist = out[0]['hmidata']
            input_imagelist = out[0]['input_image'] 
            target_imagelist = out[0]['target_image']
            q = (self.n - self.i) // self.batch_size
            mod = (self.n - self.i) % self.batch_size
            if q>0:
                self.i = self.i + self.batch_size
            else: 
                self.i = self.i + mod
            return (hmidatalist, input_imagelist, target_imagelist)

    next = __next__
JanuszL commented 2 years ago

Hi @shrutishrestha,

This is not expected. Can you tell me what DALI version you use? Does it reproduce with the latest one? Can you set cache_header_information=False and see if it still happens? Can you provide a minimal repro we can run on our end (you can check https://github.com/NVIDIA/DALI/blob/main/dali/test/python/test_operator_readers_numpy.py to see how to create a test NumPy files on the fly if you cannot share your data)?

shrutishrestha commented 2 years ago

Hi @JanuszL I found out that pad_last_batch=True was causing the error. After I deleted this, it worked for more than one thread too.

Initially I kept this and name="my_reader" when I was getting (8,1,512,512) in both my iterations while I have total 9 samples. So after keeping this, I was able to solve this problem and got (8,1,512,512) sized tensor in 1st iteration and (1,1,512,512) in my second iteration. I don't know why it didn't work for more number of threads. But once I deleted the pad_last_batch=True parameter, now it is working giving (8,1,512,512) in 1st and (1,1,512,512) in second iterations and also with more number of threads.

Thanks for your quick replies.

JanuszL commented 2 years ago

I managed to reproduce the problem. Let me check how we can fix it.

JanuszL commented 2 years ago

I think this should fix the problem.

shrutishrestha commented 2 years ago

Thank you. This fixed the issue.