bids-standard / bids-specification

Brain Imaging Data Structure (BIDS) Specification
https://bids-specification.readthedocs.io/
Creative Commons Attribution 4.0 International
274 stars 157 forks source link

SCHEMA: satellite (.overwrite.nwb) file for the base file (.nwb) #1087

Open yarikoptic opened 2 years ago

yarikoptic commented 2 years ago

to overcome possible overlap between ".json is the side car file" and ".json is the 'path' file" for nwb (see https://github.com/hdmf-dev/hdmf/pull/677#issuecomment-1085292384) the idea is to not just use .json but .overwrite.json extension to accompany the .nwb file (might end up being .nwb.json, discussion is ongoing https://github.com/hdmf-dev/hdmf/pull/677#issuecomment-1085292384 ).

I just wanted to file this issue to check if there is (or there should be) a way to associate multiple extensions together since AFAIK in BIDS we do not e.g. support .{img,hdr} NIfTI (or Analyze) bundles, or .{HEAD,BRIK} pairs from AFNI, so I could not find an example.

sappelhoff commented 2 years ago

Thanks for having an eye on this ... would the BrainVision format in EEG be an example? There we have file triplets for each "recording": .eeg contains the binary data, .vhdr contains the header, .vmrk contains an event table

yarikoptic commented 2 years ago

Thank you @sappelhoff -- good example! I wonder if within https://github.com/bids-standard/bids-specification/blob/HEAD/src/schema/objects/extensions.yaml#L231 we should define for such extensions smth like

diff --git a/src/schema/objects/extensions.yaml b/src/schema/objects/extensions.yaml
index f3d4d0dc..9b56c16d 100644
--- a/src/schema/objects/extensions.yaml
+++ b/src/schema/objects/extensions.yaml
@@ -78,15 +78,16 @@
     [`edf+`](https://www.edfplus.info/specs/edfplus.html) files are permitted.
     The capital `.EDF` extension MUST NOT be used.
 .eeg:
   name: BrainVision Binary Data
   description: |
     A binary data file in the
     [BrainVision Core Data Format](https://www.brainproducts.com/productdetails.php?id=21&tab=5).
-    These files come in three-file sets, including a `.vhdr`, a `.vmrk`, and a `.eeg` file.
+  needs:
+  - [".vhdr", ".vmrk"]
 .fdt:
   name: EEGLAB FDT
   description: |
     An [EEGLAB](https://sccn.ucsd.edu/eeglab) file.

     The format used by the MATLAB toolbox [EEGLAB](https://sccn.ucsd.edu/eeglab).
     Each recording consists of a `.set` file with an optional `.fdt` file.

on an example of .eeg to allow to define what other files it must be accompanied with. OR across lists, AND within each individual list.

An alternative, is to have some higher level listing collecting possible such groupings. e.g. src/schema/objects/extension_groups.yaml. Latter would be more concise and thus probably more robust etc. Either of those would allow a validator to ensure that all needed *plets are present in a dataset.

sappelhoff commented 2 years ago

(comment agnostic to your proposal) --> There are also "file pairs" where one part of the pair is optional. For example EEGLAB .set files (which are actually .mat or .hdf5) MAY be accompanied by a binary .fdt file (if they aren't, then the binary data is directly shipped within .set).

yarikoptic commented 2 years ago

re an EEGLAB example: So it is that .fdt file needs to have a .set file, but .set does not need .fdt.