NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.09k stars 615 forks source link

DALI won't load certain TIFFs #1899

Open syb0rg opened 4 years ago

syb0rg commented 4 years ago

I am unable to load this TIFF file. Here is a minimal example (assuming imported libraries):

class TiffPipe(Pipeline):
    def __init__(self, batch_size, num_threads, device_id, data, shuffle):
        super().__init__(batch_size, num_threads, device_id, seed=16)
        self.input = ops.FileReader(file_root='', file_list = data)
        self.decode = ops.ImageDecoder(device = 'cpu', output_type = types.RGB)

    def define_graph(self):
        tiffs, labels = self.input(name="Reader")
        images = self.decode(tiffs)
        return (images, labels)

class TiffLoader():
    def __init__(self, batch_size, file_list):
        self.pipe = TiffPipe(batch_size=batch_size, num_threads=2, device_id=0, data=file_list, shuffle=False)
        self.pipe.build()
        self.batch_size = batch_size
        self.epoch_size = self.pipe.epoch_size("Reader")
        self.dali_iterator = pytorch.DALIClassificationIterator([self.pipe], self.epoch_size, auto_reset=True)

    def __len__(self):
        return math.ceil(self.epoch_size / self.batch_size)

    def __iter__(self):
        return self.dali_iterator.__iter__()

train.csv is defined as such:

➜  cat train.csv
/home/data/0005f7aaab2800f6170c399693a96917.tiff 1

Upon instantiating TiffLoader I get the following error message:

RuntimeError: Critical error in pipeline: Error in thread 0: [/opt/dali/dali/operators/decoder/host/host_decoder.cc:41] [/opt/dali/dali/image/image_factory.cc:87] Assert on "CheckIsPNG(encoded_image, length) + CheckIsBMP(encoded_image, length) + CheckIsGIF(encoded_image, length) + CheckIsJPEG(encoded_image, length) + CheckIsTiff(encoded_image, length) + CheckIsPNM(encoded_image, length) == 1" failed: Encoded image has ambiguous format

Using the same classes and methods above with a TIFF from DALI_extra the data loader is constructed and usable as expected.

JanuszL commented 4 years ago

Hi, I cannot open this file in any photo viewer so I guess it is just corrupted. image

syb0rg commented 4 years ago

It is a large TIFF file, my default image viewers wouldn't open it as well. You should be able to view the file with GIMP.

JanuszL commented 4 years ago

What I see in the header is:

49 49 2B 00

So according to it is II 43 0. According to http://www.fileformat.info/format/tiff/corion.htm is should be II 42 0. That is why DALI considers this image as malformed. GIMP probably disregards this 43 value and loads the image anyway. Still, I'm not sure if we can allow this kind of lack of the proper header format.

syb0rg commented 4 years ago

I dug a bit into this, the header looks to be correct. It follows the BigTIFF file format:

➜  file 0005f7aaab2800f6170c399693a96917.tiff 
0005f7aaab2800f6170c399693a96917.tiff: Big TIFF image data, little-endian

I confirmed GIMP was not disregarding it by editing the header value to be 42; it wouldn't load the image.

JanuszL commented 4 years ago

So it is a BigTIFF format which is a variant of TIFF. I don't know what we need to do to support it besides accepting different values in the header. We will check it and get back to you soon.

JanuszL commented 4 years ago

I see that the image you provided is tiled. Our custom TIFF handling doesn't support that and in such case, we fall back to OpenCV which seems to not support it either. I will add this feature to our ToDo list but it doesn't seem to be a trivial amount of work add it now. If you have any spear time you can try to hack it on your own, I would start from adding additional filters to ImageFactory, then remove this limitation for tiling and dive into decoding itself. The API seems to be not very complicated.