blink1073 / tifffile

Deprecated: Read and write image data from and to TIFF files.
Other
61 stars 41 forks source link

bytes2str resolves to str() after decodeErrors #38

Closed tlambert03 closed 6 years ago

tlambert03 commented 6 years ago

after updating to 0.13.5 from 0.12.1, I found that a number of my tiff files were causing an issue during instantiation of a TiffFile object, due to non-standard, non-unicode Tiff tags in the headers. In 0.12.1, this was not an issue (they were mostly just ignored), but it looks like the expanded bytes2str function in tifffile 0.13+ is not happy with these tags and raises an exception. (these Tiff files were written with custom Labview microscope acquisition software and contain a labview binary format tag in the header)

For me, simple falling back to str(b) after trying the 'utf-8' and 'cp1252' encodings fixes everything and lets me use 0.13.5 without any additional modifications... but I'm not sure whether there is a better way to handle this. Consider this pull request more of an "issue with a possible fix", and feel free to reject and suggest a better solution.

cgohlke commented 6 years ago

Could you please share a file that fails, or a full traceback?

tlambert03 commented 6 years ago

sorry, should have done that the first time!

here's a file that fails: https://www.dropbox.com/s/c8g3hmaamlg4ego/tifffile_013_tagfail.tif?dl=0

and here's the full traceback:

In [2]: import tifffile as tf

In [3]: tf.__version__
Out[3]: '0.13.5'

In [4]: T = tf.TiffFile('tifffile_013_tagfail.tif')
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
~/anaconda/envs/tif/lib/python3.6/site-packages/tifffile/tifffile.py in bytes2str(b, encoding, errors)
   9044         try:
-> 9045             return b.decode('utf-8', errors)
   9046         except UnicodeDecodeError:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in position 167: invalid start byte

During handling of the above exception, another exception occurred:

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-4-b0a86dc325e1> in <module>()
----> 1 T = tf.TiffFile('tifffile_013_tagfail.tif')

~/anaconda/envs/tif/lib/python3.6/site-packages/tifffile/tifffile.py in __init__(self, arg, name, offset, size, multifile, movie, **kwargs)
   1634 
   1635             # file handle is at offset to offset to first page
-> 1636             self.pages = TiffPages(self)
   1637 
   1638             if self.is_lsm and (self.filehandle.size >= 2**32 or

~/anaconda/envs/tif/lib/python3.6/site-packages/tifffile/tifffile.py in __init__(self, parent)
   2648         # always read and cache first page
   2649         fh.seek(offset)
-> 2650         page = TiffPage(parent, index=0)
   2651         self.pages.append(page)
   2652         self._keyframe = page

~/anaconda/envs/tif/lib/python3.6/site-packages/tifffile/tifffile.py in __init__(self, parent, index, keyframe)
   2941             index += tagsize
   2942             try:
-> 2943                 tag = TiffTag(self.parent, data[index:index+tagsize])
   2944             except TiffTag.Error as e:
   2945                 warnings.warn(str(e))

~/anaconda/envs/tif/lib/python3.6/site-packages/tifffile/tifffile.py in __init__(self, parent, tagheader, **kwargs)
   3909             # TIFF ASCII fields can contain multiple strings,
   3910             #   each terminated with a NUL
-> 3911             value = bytes2str(stripascii(value[0]).strip())
   3912         else:
   3913             if code in TIFF.TAG_ENUM:

~/anaconda/envs/tif/lib/python3.6/site-packages/tifffile/tifffile.py in bytes2str(b, encoding, errors)
   9045             return b.decode('utf-8', errors)
   9046         except UnicodeDecodeError:
-> 9047             return b.decode('cp1252', errors)
   9048 
   9049 

~/anaconda/envs/tif/lib/python3.6/encodings/cp1252.py in decode(self, input, errors)
     13 
     14     def decode(self,input,errors='strict'):
---> 15         return codecs.charmap_decode(input,errors,decoding_table)
     16 
     17 class IncrementalEncoder(codecs.IncrementalEncoder):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 167: character maps to <undefined>

thanks!

cgohlke commented 6 years ago

Thank you. The dtype of the ImageID tag does not match the value. Probably better to coerce the tag dtype to bytes and issue a warning.

tlambert03 commented 6 years ago

ah! thank you, makes sense. Are you suggesting I submit a pull request that accomplishes that, or handle it outside of tifffile? I spent a little time trying to come up with a modification to TiffTag.__init__() to check that the declared dtype of a tag matches the value type, but I'm not sure how to verify the dtype of a tag value without a try/catch that seems a bit ugly and not worthy of a pull request.

cgohlke commented 6 years ago

Should be fixed at https://www.lfd.uci.edu/~gohlke/code/tifffile.py.html

blink1073 commented 6 years ago

I released 0.14 which has the updated source.

tlambert03 commented 6 years ago

thank you both!