bids-standard / bids-specification

Brain Imaging Data Structure (BIDS) Specification
https://bids-specification.readthedocs.io/
Creative Commons Attribution 4.0 International
272 stars 156 forks source link

Add Required/Recommended Metadata info to Schema #699

Open dlevitas opened 3 years ago

dlevitas commented 3 years ago

This is in regards to the BIDS schema, where yaml files specify the required/recommended suffixes, entities, and extensions for BIDS file names. However, the BIDS spec specifies select fields in the JSON metadata that are either required, recommended, or optional. For example, functional MRI acquisitions must have the RepetitionTime field in the corresponding JSON file. Is this something that can be added to the schema? I'd be happy to open a PR if this seems worthwhile.

tsalo commented 3 years ago

This is definitely a long-term goal for the schema. I started working with the NIDM-Terms folks on this (see #423), although I've been pulled into other things recently and I haven't made much progress on it (see #609, which is probably woefully out of date at this point). The relevant issue is probably #604.

Two elements that we'll want to have working before we move the metadata into the schema are:

  1. Supporting logic within the schema (#620). There are relationships between metadata fields that we'll want to represent in the schema. My favorite example of this is the timing info for task fMRI. You can have RepetitionTime, but not AcquisitionDuration or VolumeTiming, or you can have SliceTiming and VolumeTiming, but not RepetitionTime or DelayTime, etc. There are like five possible combinations of five different metadata fields.
  2. Rendering schema elements in the specification automatically (#610). If we don't have the schema represented in the specification directly, we're just asking for drift between the two information sources. We've already noticed the difficulty in keeping what the schema up-to-date w.r.t. the specification, so I'd hate to add all of the metadata fields to the schema and then have it sit, growing more and more out-of-date, as has happened with #609.

I'd be happy to have any help you're willing to provide!

dlevitas commented 3 years ago

Sure, I'd be happy to help. Regarding your points:

1). That's a good example, one that I wasn't aware of. I suppose that would need to be fleshed out at some point.

2). My thought was to use a web scrapping library (e.g. Beautiful Soup) to select the schema elements; unsure though if that would address the issue. If so, grab the schema elements and place them into yaml files based on DataType (and ModalityLabel)

tsalo commented 3 years ago

1. :+1:

2. The NIDM-Terms folks have done a lot of work on automatically extracting terms from the specification already. Check out the bids-terms files in the nidm-terms repository. There are a few other places on GitHub with relevant scripts and files, but I can't remember them at the moment. There's still a fair amount of work to do (e.g., manual review, figuring out how to represent the terms in yaml format, what metadata we care about for each term, etc.), but that's a great place to start working from.

EDIT: Also, adding functions to the new schema rendering tools presented in #610 (after it's merged, of course) for building metadata tables would be very help as well.

satra commented 3 years ago

@dbkeator - pinging you here. perhaps someone has created all the metadata fields somewhere outside markdown and we just don't know it :) or we should at least take a union of all the json metadata from the openneuro datasets.

tsalo commented 3 years ago

At least to start, I think it would be a good idea to do a direct translation of the json schemas from the validator to yaml format for the specification schema, combined with the descriptions from the specification. I've started drafting something to that effect in #762, if anyone has some time to look it over.