Investigate OME tiff compatibility

ngreenwald commented 1 year ago

Is your feature request related to a problem? Please describe. The bio-imaging field seems to consolidating around OME tiffs as (one of) the format to use for representing image data. We currently don't support this in the ark pipeline

Describe the solution you'd like We should investigate a) how widespread this format is, or if there are other formats which appear to be equally popular, and b) what the interface/API looks like to read and write them. I know Adam made a converter to go from multi-channel to OME already

srivarra commented 1 year ago

@ngreenwald

OME-TIFFS

OME-TIFF Overview

Some features to consider using OME-TIFFs include

Used across the industry (Commercial Imaging Companies, HuBMAP, and other Data Resource aggregators)
Long term stability, has been in use since 2005.
Supports 8D images (we can collapse $x$, $y$ and channels for each FOV into a single image 3D image for example)
Decent metadata structure
Implementing it would be reasonably straightforward as it is used for HuBMAP.
Tiling support

There are a few disadvantages

Even though it can support high-dimensional images, data access is limited to 2D tiles, therefore making it bottleneck with larger image sizes.
No parallel pipeline

OME-TIFF API

Straightforward, using the Python tifffile plugin we already use. We can take advantage of more metadata as well.

Writing a $n$-dimensional OME-TIFF

import numpy as np
from tifffile import TiffWriter

data = np.random.randint(0, 1023, (12, 2048, 2048), 'uint16')
pixel_size = 0.02 #size is in micrometers

with TiffWriter('../data/example.ome.tiff', bigtiff = True) as img:
    metadata = {
        'axes': 'CYX' # We would change the axes to the proper one we would use
        'Channel': {'Name': ['chan1', 'chan2', 'chan3']}, # List of channels
        'PhysicalSizeX': pixelsize,
        'PhysicalSizeXUnit': 'µm',
        'PhysicalSizeY': pixelsize,
        'PhysicalSizeYUnit': 'µm',
    }
    options = {
        photometric='bw',
        tile = (a,b) # We can create tiled images as well
        compression = 'zlib' # there are several compression algorithms,
        resolutionunit='MILIMETER'
    }
    img.write(
        data,
        resolution = (1e4 / pixelsize, 1e4 / pixelsize)
        **options
    )
    thumbnail = (data[0, ::20, ::20] >> 2).astype('uint8')
    img.write(thumbnail, metadata={'Name': 'thumbnail'})

Reading the $N$-dimensional OME-TIFF is also pretty simple, we just loop over the file after opening it and we can write some wrapper functions to easily load it in as a multidimensional Xarray object.

We could even add segmentation, cell and pixel masks as other dimensions for the image.

Alternatives

OME-ZARR

OME-ZARR is OME's next-generation file format (OME-NGFF spec). It's very well suited for $n$-dimensional dense arrays with metadata.

OME-ZARR's Features

Uses Zarr a modern Numpy array compression toolset.
Fast Multidimensional access
Allows for multiple levels of metadata (like FOV level, Channel level, $x$,$y$)
Seems to be what OME will want to transition to eventually.
Contains the main features of OME-TIFFs

Reasons to avoid OME-ZARR

Newer standard, not super popular as of now (paper has only 14 citations)
Different API compared to tifffile's implementation of OME-TIFF.

Writing create a 3D OME-ZARR

import numpy as np
from ome_zarr.io import parse_url
from ome_zarr.writer import write_image

data = np.random.randint(0, 1023, (12, 2048, 2048), 'uint16')
pixel_size = 0.02 #size is in micrometers

# write the image data
img_store = parse_url("../data/example_ngff_image.zarr", mode = "w").store
root = zarr.group(store = store)

write_image(image=data, group=root, axes = "cyx", storage=dict(chunks=(12, 2048, 2048)))

# Create Metadata groups
label_name = "LABEL_NAME"
labels_grp = root.create_group("labels")
labels_grp.attrs["labels"] = [label_name] # We can have multiple labels
label_grp = labels_grp.create_group(label_name)

# Add image labels
label_grp.attrs["image-label"] = {
    "colors": [
        {"label-val": 1, "rgba": [255, 0, 0, 255]}
    ]
}
...
write_image(label, label_grp, axes="cyx")

Reading the data is quite simple however, the first index is the pixel level data, and then metadata stored in other dimensions.

The API seems to be more convoluted and less polished / documented than tifffile's OME-TIFFs format. OME-ZARR's documention does not contain an exhaustive list of the proper fields of metadata, but the actual file spec sheet is online.

In summary

Overall, I think that moving to OME-TIFFs would be good transition which can simplify other aspects of the pipeline. But we should keep an eye on OME-NGFF (currently on v0.4.0), and OME-ZARR (currently on v0.2.0) and see if it starts to pick up more traction down the line.

ngreenwald commented 1 year ago

Addressed by #819

angelolab / ark-analysis