bids-standard / bids-2-devel

Discussions and suggestions of backwards incompatible changes to BIDS
https://bids.neuroimaging.io/
Creative Commons Attribution 4.0 International
11 stars 1 forks source link

Drop sidecar JSON files and store all metadata in the header #25

Closed tsalo closed 7 months ago

tsalo commented 4 years ago

This could be done with NIFTI header extension. JSON could still be used internally. This would also remove the need for the hierarchical rule (which is quite confusing). The downside is that is that it’s harder to view and edit metadata (without specialized tools).

Original authors: Unknown

jbteves commented 4 years ago

I would be in favor of this. The solution to viewing would be to make a simple tool which extracts only the metadata (such as AFNI's 3dinfo, or to ask major package developers to add viewing options to metadata. @neurolabusc do you think it would be hard to add a metadata viewer to mricroGL if this proposal were accepted? I'm also tagging @afni-rickr and @leej3 as I believe we had chatted about something like this briefly at Code Convergence.

robertoostenveld commented 4 years ago

This solution only works for NIFTI files, not for other file formats that are used for storing raw data (such as FIF or EDF).

neurolabusc commented 4 years ago

@jbteves, such a change would be easy to support with MRIcroGL and dcm2niix. There are clear benefits, which is identical to moving from the two-file Analyze/NIfTI .hdr/.img to the one file .nii (and other two file formats like .nhdr/.raw, .head/.brik, etc):

  1. Since the BIDS information is embedded in the same file as the other information, a user does not need to make sure to keep them together and to make sure file names correspond.
  2. For macOS users, applications have restricted file system access. Specifically, while a user may have read permission for a file, it does not mean that an application launched by that user has access to a file. Therefore, if a user drops the file img.niionto an application to load it, it does not necessarily mean the application has permission to read the associated img.json.

For full disclosure, I did advocate for single files (and also emphasized the dangers of using ASCII for floating point values and the Inheritance Principle) when the BIDS sidecar was developed. However, the consensus was that the benefits of having a separate, human readable file outweighed these costs. Using a text editor to edit text fields in a binary file would corrupt the binary data, converting binary data to base64 (e.g. GIfTI) would impact file size as well as read/write speeds, and other methods would require custom tools.

Like any format, BIDS was a compromise of different strengths and limitations. From a Darwinian perspective, the format that we have has been wildly popular so it seems to fill a niche. We now have many tools adapted to the format as defined, so we need to think carefully regarding the unintended consequences of new changes to a robust and popular format. When we change a format, we also need to change all the tools and all the documentation. As an analogy, I would point to the way Python 3 broke compatibility to improve the language as a cautionary tale. This caused a lot of grief to users or those who searched stack exchange for code snippets to solve their problems.

afni-rwcox commented 4 years ago

I am in favor of a single file approach, where practicable, for ease of file transport and I/O. In particular, I now feel my decision (1994) for the .HEAD/.BRIK file pair was not a good one. I was deeply involved in the NIfTI-1 planning, and in that case the .hdr/.img pair was there just be be compatible with ANALYZE-7.5 -- which desire drove a lot of the rest of the NIfTI-1 decision making process, to ease the transition for some of the NIfTI-1 stakeholders.

An extension should announce its format at the beginning. In the case of AFNI-generated .nii files, the contents that would be in .HEAD file are encoded in XML, and the AFNI extension starts with

   <?xml version='1.0' ?>
   <AFNI_attributes

which is at least somewhat informative about what is to follow - that is to say, it is clearly not JSON, a Google protocol buffer message, or random unformatted stuff. Why did I use XML, not JSON? Because I started using XML for inter-program messaging in late 2002 (pre-NIfTI) and so it was available to me. I was unable to get the NIfTI team to even consider using an XML-based header, so instead I came up with the "let's do a remix of ANALYZE 7.5" approach, which was adopted after significant changes (improvements!) by the team discussions. When I came to put AFNI header data into the NIfTI extension, I decided to use the XML code I'd already developed to make it at least potentially readable by someone else.

Why this long story? To help shape the future by the errors and successes of the past. NIfTI-1 was a success, because it was simple enough to get buy-in from important developers, and added enough capability to the ANALYZE 7.5 format to be worth the effort. The NIfTI extensions (not my idea, was added in discussions) made the format very useful for all, at the same time keeping a simple core. These were a valuable pairing - simple core + extensions. Having a standard simple format for extensions would have been even better.

JSON is pretty simple in its basic structure (more so than XML, for sure), and so is a logical way to store random extra info. In principle, the entire header for a dataset could be a JSON (or perhaps a string of JSONs, to allow for extensions) - that is, drop the NIfTI header entirely and encode it in JSON.

However, this approach has one significant drawback - in NIfTI, the byte offset to the start of the image data is encoded in the header, which makes it easy to memory-map the data file rather than read it with file I/O. In the days of streaming data around, this may sound pitiful, but is less so when one has to deal with multi-gigabyte size files as the units of data collections.

OK, too long -- I'm stopping this meandering and going back to dealing with email. Did I mention that Outlook sucks?

yarikoptic commented 7 months ago

I will make an executive decision to close this issue since I am very inline with @robertoostenveld factual statement that BIDS is not only about NIfTIs. As such I think this issue has no chance for being "solved". If someone wants to extend NIfTI with a formalized NIfTI extension to mirror BIDS sidecar file -- it is a separate question/issue to be developed/agreed upon. Someone could start a BEP on that aspect. bids-validator then could be tuned to ensure consistency between embedded and "external" (in .json) records.

Also note a single file solution in general "inflexible" to support "cheap" metadata modifications/fixes which is now a motivation e.g. for NWB files to use Zarr filesets instead of a single HDF5 file.