angelolab / ark-analysis

Integrated pipeline for multiplexed image analysis
https://ark-analysis.readthedocs.io/en/latest/
MIT License
71 stars 25 forks source link

Investigate OME tiff compatibility #759

Closed ngreenwald closed 1 year ago

ngreenwald commented 1 year ago

Is your feature request related to a problem? Please describe. The bio-imaging field seems to consolidating around OME tiffs as (one of) the format to use for representing image data. We currently don't support this in the ark pipeline

Describe the solution you'd like We should investigate a) how widespread this format is, or if there are other formats which appear to be equally popular, and b) what the interface/API looks like to read and write them. I know Adam made a converter to go from multi-channel to OME already

srivarra commented 1 year ago

@ngreenwald

OME-TIFFS

OME-TIFF Overview

Some features to consider using OME-TIFFs include

There are a few disadvantages

OME-TIFF API

Writing a $n$-dimensional OME-TIFF

import numpy as np
from tifffile import TiffWriter

data = np.random.randint(0, 1023, (12, 2048, 2048), 'uint16')
pixel_size = 0.02 #size is in micrometers

with TiffWriter('../data/example.ome.tiff', bigtiff = True) as img:
    metadata = {
        'axes': 'CYX' # We would change the axes to the proper one we would use
        'Channel': {'Name': ['chan1', 'chan2', 'chan3']}, # List of channels
        'PhysicalSizeX': pixelsize,
        'PhysicalSizeXUnit': 'µm',
        'PhysicalSizeY': pixelsize,
        'PhysicalSizeYUnit': 'µm',
    }
    options = {
        photometric='bw',
        tile = (a,b) # We can create tiled images as well
        compression = 'zlib' # there are several compression algorithms,
        resolutionunit='MILIMETER'
    }
    img.write(
        data,
        resolution = (1e4 / pixelsize, 1e4 / pixelsize)
        **options
    )
    thumbnail = (data[0, ::20, ::20] >> 2).astype('uint8')
    img.write(thumbnail, metadata={'Name': 'thumbnail'})

Reading the $N$-dimensional OME-TIFF is also pretty simple, we just loop over the file after opening it and we can write some wrapper functions to easily load it in as a multidimensional Xarray object.

We could even add segmentation, cell and pixel masks as other dimensions for the image.

Alternatives

OME-ZARR

OME-ZARR is OME's next-generation file format (OME-NGFF spec). It's very well suited for $n$-dimensional dense arrays with metadata.

OME-ZARR's Features

Reasons to avoid OME-ZARR

Writing create a 3D OME-ZARR

import numpy as np
from ome_zarr.io import parse_url
from ome_zarr.writer import write_image

data = np.random.randint(0, 1023, (12, 2048, 2048), 'uint16')
pixel_size = 0.02 #size is in micrometers

# write the image data
img_store = parse_url("../data/example_ngff_image.zarr", mode = "w").store
root = zarr.group(store = store)

write_image(image=data, group=root, axes = "cyx", storage=dict(chunks=(12, 2048, 2048)))

# Create Metadata groups
label_name = "LABEL_NAME"
labels_grp = root.create_group("labels")
labels_grp.attrs["labels"] = [label_name] # We can have multiple labels
label_grp = labels_grp.create_group(label_name)

# Add image labels
label_grp.attrs["image-label"] = {
    "colors": [
        {"label-val": 1, "rgba": [255, 0, 0, 255]}
    ]
}
...
write_image(label, label_grp, axes="cyx")

Reading the data is quite simple however, the first index is the pixel level data, and then metadata stored in other dimensions.

The API seems to be more convoluted and less polished / documented than tifffile's OME-TIFFs format. OME-ZARR's documention does not contain an exhaustive list of the proper fields of metadata, but the actual file spec sheet is online.

In summary

Overall, I think that moving to OME-TIFFs would be good transition which can simplify other aspects of the pipeline. But we should keep an eye on OME-NGFF (currently on v0.4.0), and OME-ZARR (currently on v0.2.0) and see if it starts to pick up more traction down the line.

ngreenwald commented 1 year ago

Addressed by #819