bigdataviewer / bigdataviewer-core

ImgLib2-based viewer for registered SPIM stacks and more
BSD 2-Clause "Simplified" License
34 stars 35 forks source link

Structured meta data in N5 or HDF5 #82

Open axtimwalde opened 4 years ago

axtimwalde commented 4 years ago

The lack of structured meta-data in HDF5 made it necessary to store more complex stuff in the external BDV XML file. Also, it naturally limited the richness of meta-data concerned with multi-scale image pyramids as is visible in the flat specifications that we currently use. Modern multi-file backends such as Zarr and N5 support structured meta-data, typically through JSON. While HDF5 does not natively support structured meta-data, we recently added support for this through the N5-HDF5 API. The method is simple: primitive flat meta-data is stored in the corresponding native HDF5 type, structured data is stored as JSON. The API hides this background trickery and is consistent across all backends. @d-v-b spent some time to propose an improved meta-data scheme for multi-scale image pyramids that resolves four issues with the existing format:

  1. It does not rely on a strict naming convention and directory/ group listing based discovery mechanisms.
  2. It can be read and stored in all backends supported by the N5 API (including HDF5, Zarr and cloud)
  3. It allows to open individual levels of the scale pyramid with the correct scaling and offset without special treatment.
  4. The specification of origin and pixel-spacing is unambiguous and enables other downsampling schemes than block-averaging.

Time series, setups, and channels are not considered in this proposal and we welcome input.

Using the same method, however, it should be possible to store all meta-data that is currently stored in the BDV-XML file as an attribute of the N5/ HDF5 container. I find this very attractive.

tpietzsch commented 4 years ago

@axtimwalde @d-v-b Sounds promising. Do you have a link to the proposal?

In general, I'm all for adding a meta-data to the N5 and HDF5, maybe to the point where it is possible to recreate the XML completely, such that XML is not needed for those. However, I think for the foreseeable future the XML will remain the authoritative source, because it provides extensibility for non-N5/HDF5 backends, such as TIFF files, CATMAID etc.

d-v-b commented 4 years ago

@tpietzsch the proposal is here (the "COSEM style"): https://github.com/janelia-cosem/schemas/blob/master/multiscale.md ; it's not final, but the basic principles are: a) put multiscale-specific stuff in group-level attributes b) keep dataset-specific stuff (resolution, offset, etc) in dataset attributes.