bids-standard / bids-specification

Brain Imaging Data Structure (BIDS) Specification
https://bids-specification.readthedocs.io/
Creative Commons Attribution 4.0 International
267 stars 154 forks source link

Allow OME-Zarr for non-microscopy data #1704

Open balbasty opened 5 months ago

balbasty commented 5 months ago

BIDS accepts ome-zarr data in its microscopy extension, but not in its main MRI-related specification.

In most cases, MRI volumes are small enough that chunked formats do not make sense, but MRIs acquired in ex vivo human brains can become quite large. For example, these 120 um isotropic MRIs are about 5GB each. This is particularly problematic for web-based viewing, as most (all?) viewers load the entire volume in GPU memory and have a hard memory limit. The MRI linked above is 1600x1400x640 and I am not sure that niiview would be able to display it. Even if it were, having to download the entire file before showing it makes this impractical.

Could we allow OME-Zarr files in the main spec?

The main problem I see is that, in its current form, the OME-NGFF specification does not allow storing most of the metadata that live in the nifti header -- most improtantly the qform and sform. As an alternative, we have drafted a very lightweight supplement to OME-Zarr, namely NIfTI-Zarr, which only requires dumping the nifti header under .zattrs["nifti"]["base64"] (using base64 encoding). This makes it very easy to decode the nifti metadata for any library that has access to a base64 decoder, and a nifti parser. We have reference implementations in python and julia.

I guess this might be somewhat related to #197

@satra @martindurant @yarikoptic

effigies commented 5 months ago

I think this seems like a sensible intermediate between a novel format and the limitations of NIfTI. Is there still a NIfTI working group to get the blessings of?

martindurant commented 5 months ago

I was tagged so that I could mention kerchunk, which could provide a way to read directly from the .nii.gz files in parallel and chunk-wise (chunking limited to the largest dimension) by pretending to be a zarr dataset without rewriting the data at all. This is already possible for uncompressed data, and gzip would take some work ( https://github.com/pauldmccarthy/indexed_gzip/issues/112 ). Having done that, you could create a global virtual zarr dataset over all of the files in the archive.

If you also want subsampled pyramid (OME) data, you still need to create and store those somewhere, but of course they would be much smaller. The format could be zarr or something else, and kerchunk could present the whole lot as a single zarr dataset.

Note: all this only works in python, but https://observablehq.com/@manzt/ome-tiff-as-filesystemreference (appears to have gone stale) presented a prototype in-browser JS viz of exactly the same thing.

martindurant commented 5 months ago

dumping the nifti header under .zattrs["nifti"]["base64"]

Mild comment: this seems an odd choice to me (but I don't know the domain). The nice thing about JSON metadata is that you can trivially read it, so why not (also?) include the fields, e.g., as interpreted by nibabel dict(nibabel.nifti1.Nifti1Header)?

yarikoptic commented 5 months ago

Agree with @martindurant on unclear reason for an attempt to preserve nifti header in its original binary form. I can only guess that rationale was to facilitate 1-to-1 binary roundtrip nifti-zarr-nifti. But I do not think that it is that much needed or desired. JSON choice there would have been much better. Continuing on

OME-NGFF specification does not allow storing most of the metadata that live in the nifti header -- most improtantly the qform and sform

I wonder if NIfTI-Zarr considered adopting superset of NIfTI + .json sidecar fields defined by BIDS already to be included within OME-Zarr? From the other side -- shouldn't we in BIDS converge on harmonization of metadata in sidecar JSON file to cover also metadata people rely on getting directly from NIfTI (ie sform/qfrorm, AcquisitionMatrixExtent, ...)?

Also attn @matthew-brett as he was into "new imaging format" considerations.

Overall it sounds like a separate and large issue to discuss so we might want a dedicated another issue to it. But it also feels like a prerequisite to have a complete answer to this one. As @balbasty noted that OME-Zarr lacks needed metadata, we apparently even lack it in sidecar files, and it is unlikely that we would accept some ad-hoc (not "agreed upon or already widely used") solution within OME-Zarr.

martindurant commented 5 months ago

OME-Zarr lacks needed metadata

Do you mean it lacks required fields, or that the metadata structure doesn't allow for the kind of information you want to preserve? I find in other fields (particularly earth observation and climatology), a huge amount of complex metadata is stored in zarrs.

balbasty commented 5 months ago

I think @yarikoptic means that the OME-NGFF specification (which formalizes OME metadata) does not currently support affine transforms (nor other NIfTI metadata). I know they are working on a spatial transform supplement, but it will most likely take time to find a consensus, and even more time for something like NIfTI and OME to converge.

I actually feel quite strongly about keeping the binary form of the header:

It's a personal view, but I also don't love relying on sidecar json files. A big reason why NIfTI worked, in a social sense, is that it is a single file format. OME-Zarr can be seen as a "single directory" format (when filesystem-backed) so also works well in that respect.

satra commented 5 months ago

@martindurant and @yarikoptic - the expanded form of the binary header in json form is also included in the zarr metadata in @balbasty example.

martindurant commented 5 months ago

My opinions on metadata were merely suggestions, you people know better than me, especially in other languages (although zarr is >90% python).

A big reason why NIfTI worked, in a social sense, is that it is a single file format. OME-Zarr can be seen as a "single directory" format (when filesystem-backed) so also works well in that respect.

This does get less tractable for bigger data, and when directly accessing the data remotely. Cloud-native workflows require reading as few bytes of the data (from dandi, ipfs, s3, whatever) as necessary and some manner of parallelism. nifti clearly comes from an age of "download everything you need before starting", which is many cases needs an online parameter search service to pick the right files. zarr can index over all parameter space (with kerchuk or if converting the whole dataset) and has concurrency/chunking/parallelism built in.

I know people who have heard of zarr know these things, but still worth pointing out!

effigies commented 4 months ago

Another approach to the B64-in-JSON encoding for the header would be to create a nifti array that is just the literal NIfTI header:

import nibabel as nb
import numpy as np
import zarr

img = nb.Nifti1Image(np.zeros((256,256,256), dtype='f4'), np.diag((2, 2, 2, 1)))

root = zarr.group(store='/tmp/img.nii.zarr')
root.array(name='0', data=img.dataobj)
root.array(name='nifti', data=img.header.binaryblock)

rt_header = nb.Nifti1Header(binaryblock=np.asanyarray(root['nifti']).tobytes())
round_trip = nb.Nifti1Image(
    root['0'],
    affine=rt_header.get_best_affine(),
    header=rt_header,
)

Playing with a real file, I was able to read the header with mri_info:

❯ mri_info /tmp/nii.zarr/nifti/0
Volume information for /tmp/nii.zarr/nifti/0
          type: nii
    dimensions: 208 x 300 x 320
   voxel sizes: 0.800000, 0.800000, 0.800000
          type: FLOAT (3)
           fov: 166.400
           dof: 1
        xstart: -83.2, xend: 83.2
        ystart: -120.0, yend: 120.0
        zstart: -128.0, zend: 128.0
            TR: 2400.00 msec, TE: 0.00 msec, TI: 0.00 msec, flip angle: 0.00 degrees
       nframes: 1
       PhEncDir: UNKNOWN
       FieldStrength: 0.000000
ras xform present
    xform info: x_r =   0.9981, y_r =   0.0621, z_r =   0.0057, c_r =    -0.9937
              : x_a =  -0.0622, y_a =   0.9980, z_a =   0.0072, c_a =    18.6450
              : x_s =  -0.0052, y_s =  -0.0075, z_s =   1.0000, c_s =     4.2302
Orientation   : RAS
Primary Slice Direction: axial

voxel to ras transform:
                0.7984   0.0497   0.0045   -92.2098
               -0.0497   0.7984   0.0057   -96.8662
               -0.0042  -0.0060   0.8000  -122.4309
                0.0000   0.0000   0.0000     1.0000

voxel-to-ras determinant 0.512

ras to voxel transform:
                1.2476  -0.0777  -0.0065   106.7157
                0.0776   1.2476  -0.0094   126.8562
                0.0071   0.0090   1.2499   154.5524
                0.0000   0.0000   0.0000     1.0000

Any tool that can currently parse a NIfTI header should be able to work with this structure.

martindurant commented 4 months ago

@effigies , that's quite clever :)

yarikoptic commented 4 months ago

indeed quite sneaky! example needs little fixing though

❯ python tryniftihdr.py
Traceback (most recent call last):
  File "/tmp/tryniftihdr.py", line 11, in <module>
    rt_header = nb.Nifti1Header(binaryblock=np.asanyarray(nifti).tobytes())
                                                          ^^^^^
NameError: name 'nifti' is not defined
effigies commented 4 months ago

Fixed. Though the goal was to be lazy, not sneaky.

yarikoptic commented 4 months ago

Being sneaky while being lazy is a true super-power! ;)