cctbx / dxtbx

Diffraction Experiment Toolbox
BSD 3-Clause "New" or "Revised" License
2 stars 12 forks source link

Improve TIFF support #376

Open dagewa opened 3 years ago

dagewa commented 3 years ago

DXTBX has limited support for TIFF images, mainly catering for files written by marccd or by some Bruker detectors. However, TIFF is more generally a common format in electron microscopy, and not all of these files can be understood by FormatTIFF. I have worked around that using the tifffile library in some cases, such as for FormatTIFF_Merlin, which does not inherit from FormatTIFF.

There are numerous libraries that handle TIFF data in Python, but without an exhaustive trial, tifffile at least seems a good choice. It requires Python >=3.7.

I propose that when Python 3.6 support is dropped, we include tifffile and rework FormatTIFF to use this library.

dagewa commented 2 years ago

I started looking at this and rewrote FormatTIFF to use tifffile, but there is a catch - you pass a filename and the library opens files itself, so it can't make use of the dxtbx transparent uncompression mechanism for gzipped files etc. As a result this would mean a loss of functionality compared to present. I'm not sure whether to go ahead with replacing FormatTIFF, or whether to leave sleeping dogs lie, but make a parallel format class for generic TIFFs that are not recognised by the Rigaku/Bruker TIFF readers.

biochem-fan commented 2 years ago

For transparent reading of tiff.gz, we need a Python wrapper for libtiff's TIFFClientOpen function.

That being said, I am not sure if we need to support tiff.gz. TIFF has its own internal compression mechanism (LZW or deflate), so there is little point using tiff.gz.

dagewa commented 2 years ago

We're starting to hit this again. ebba6d03 avoided FormatTIFFBruker picking up experimental ED frames from the PETS Glycine example, but @ronandrevon's simulated frames were still being understood by that format class. It seems it is just because the experimental frames had some of the tags outside the first 1024 bytes, which is all that read_basic_tiff_header uses.

I am not sure if we need to support tiff.gz

Ok, that seems reasonable to me. I could go back to the idea of rewriting the whole FormatTIFF hierarchy to use tifffile to clean some of this up.