Closed ngreenwald closed 1 year ago
@ngreenwald
OME-TIFF Overview
Some features to consider using OME-TIFFs include
There are a few disadvantages
OME-TIFF API
tifffile
plugin we already use. We can take advantage of more metadata as well.Writing a $n$-dimensional OME-TIFF
import numpy as np
from tifffile import TiffWriter
data = np.random.randint(0, 1023, (12, 2048, 2048), 'uint16')
pixel_size = 0.02 #size is in micrometers
with TiffWriter('../data/example.ome.tiff', bigtiff = True) as img:
metadata = {
'axes': 'CYX' # We would change the axes to the proper one we would use
'Channel': {'Name': ['chan1', 'chan2', 'chan3']}, # List of channels
'PhysicalSizeX': pixelsize,
'PhysicalSizeXUnit': 'µm',
'PhysicalSizeY': pixelsize,
'PhysicalSizeYUnit': 'µm',
}
options = {
photometric='bw',
tile = (a,b) # We can create tiled images as well
compression = 'zlib' # there are several compression algorithms,
resolutionunit='MILIMETER'
}
img.write(
data,
resolution = (1e4 / pixelsize, 1e4 / pixelsize)
**options
)
thumbnail = (data[0, ::20, ::20] >> 2).astype('uint8')
img.write(thumbnail, metadata={'Name': 'thumbnail'})
Reading the $N$-dimensional OME-TIFF is also pretty simple, we just loop over the file after opening it and we can write some wrapper functions to easily load it in as a multidimensional Xarray
object.
We could even add segmentation, cell and pixel masks as other dimensions for the image.
OME-ZARR
OME-ZARR is OME's next-generation file format (OME-NGFF spec). It's very well suited for $n$-dimensional dense arrays with metadata.
OME-ZARR's Features
Reasons to avoid OME-ZARR
tifffile
's implementation of OME-TIFF.Writing create a 3D OME-ZARR
import numpy as np
from ome_zarr.io import parse_url
from ome_zarr.writer import write_image
data = np.random.randint(0, 1023, (12, 2048, 2048), 'uint16')
pixel_size = 0.02 #size is in micrometers
# write the image data
img_store = parse_url("../data/example_ngff_image.zarr", mode = "w").store
root = zarr.group(store = store)
write_image(image=data, group=root, axes = "cyx", storage=dict(chunks=(12, 2048, 2048)))
# Create Metadata groups
label_name = "LABEL_NAME"
labels_grp = root.create_group("labels")
labels_grp.attrs["labels"] = [label_name] # We can have multiple labels
label_grp = labels_grp.create_group(label_name)
# Add image labels
label_grp.attrs["image-label"] = {
"colors": [
{"label-val": 1, "rgba": [255, 0, 0, 255]}
]
}
...
write_image(label, label_grp, axes="cyx")
Reading the data is quite simple however, the first index is the pixel level data, and then metadata stored in other dimensions.
The API seems to be more convoluted and less polished / documented than tifffile
's OME-TIFFs format.
OME-ZARR
's documention does not contain an exhaustive list of the proper fields of metadata, but the actual file spec sheet is online.
Overall, I think that moving to OME-TIFFs would be good transition which can simplify other aspects of the pipeline. But we should keep an eye on OME-NGFF (currently on v0.4.0
), and OME-ZARR (currently on v0.2.0
) and see if it starts to pick up more traction down the line.
Addressed by #819
Is your feature request related to a problem? Please describe. The bio-imaging field seems to consolidating around OME tiffs as (one of) the format to use for representing image data. We currently don't support this in the
ark
pipelineDescribe the solution you'd like We should investigate a) how widespread this format is, or if there are other formats which appear to be equally popular, and b) what the interface/API looks like to read and write them. I know Adam made a converter to go from multi-channel to OME already