bids-standard / bids-specification

Brain Imaging Data Structure (BIDS) Specification
https://bids-specification.readthedocs.io/
Creative Commons Attribution 4.0 International
277 stars 162 forks source link

Repository of BIDS terms and supporting the BIDS community #423

Open dbkeator opened 4 years ago

dbkeator commented 4 years ago

Dear BIDS Community,

As you may know, my collaborators and I have recently received funding under the BRAIN Initiative program (NIDM-Terms project description) to develop community-driven controlled vocabularies for human neuroimaging data. The goals are to improve our ability to search across datasets, subset and combine data across projects, and annotate data and datasets so that they can be linked, eventually across the BRAIN Initiative and other data archives. Our intent is to connect BIDS terms, Cognitive Atlas and Cognitive Paradigm Ontology terms, and other domain terminologies to increase dataset information and connection to other datasets.

To start, we’ve mined terms from the BIDS specifications, (see BIDS_Terms) and linked them to similar terms in existing knowledge sources (e.g. BIDS participant_id with subject identifier terms used in many domain-relevant terminologies).

We would like our efforts to incorporate input from the relevant communities and we are building workflows to enable users to find existing terms as well as submit new terms. We are also designing a workflow for the curation of properties used to describe terms (e.g. standard coded units, description, labels, url, allowable values, etc.). The ability to connect terms to broader concepts has been modeled on work done in schema.org and specifically ReproNim’s standardizing assessment schemas. Briefly, the process consists of creating JSON-LD files for each term and managed in GitHub Terms Repo to maintain provenance, version control, and discussion. The terms, once curated, are then imported into InterLex and are available for broader use (see NIDM-Terms.)

Although we are at the early stages of the proposed work, we are interested in discussing with the BIDS community how we can efficiently update our representation of the BIDS vocabulary when new terms are introduced in BEPs and how we can have representation from the BIDS community in curating the BIDS related terms and connecting them with broader knowledge bases. We envision many potential benefits for the BIDS community, such as ease of creation of BIDS specifications tables or a BIDS glossary by pulling from our system, the ability of BIDS users to find BIDS terms without searching through the BEPS, as well as the establishment of a central location for the curation of proposed BIDS terms.

Cheers, David Keator

sappelhoff commented 4 years ago

@bids-standard/steering @bids-standard/bep_leads

melanieganz commented 4 years ago

Dear David,

thank you so much for bringing this up. We have shortly discussed it in the steering group meeting yesterday and were wondering what exactly you would like from the BIDS community? Could you please clarify this a bit more? In principle, there could be new terms added with every version change of BIDS, but most probably this would happen with major releases only. So maybe automatic processing of the spec should be done after every major release? Also are you interested in accessing the current extensions already now? Many of them are work in progress and maybe far from a merge with the main spec, but the terminology used in them would still be relevant, I guess.

Please let us know your thought, regards, Melanie

dbkeator commented 4 years ago

Hi Melanie,

Thank you for the response. Initially what we're interested in is working with the BIDS community to develop methods for:

  1. Keeping our BIDS terms up to date as new terms get added from BEP work.
  2. Helping to generate tables for future BIDS specification updates where the term definitions and information is pulled from our nidm-terms work. This would be simple python code to output a table for selected terms to be incorporated into the specifications document.
  3. We work with the BEP developers to make sure they aren't re-using an existing BIDS term with a conflicting definition. We see this happen a lot in DICOM where you can't understand the meaning of a particular tag unless you know which module it came from because the same term label is used but has different meanings.
  4. For future BEP work, we'd like to provide the BIDS community with methods of adding new terms to the BIDS ecosystem by way of adding them to our nidm-terms repository. This is a work in progress but could be via github pull requests to our repo or directly in our nidm-terms InterLex repository. The nidm-terms group then does some additional annotations linking these new BIDS terms to other related terms improving our ability to automatically search across BIDS and other datasets via these term equivalencies. Further, we annotate the terms with broader concepts when appropriate for high-level search. You can see a demo of how these concept-based queries can be useful across OpenNeuro BIDS datasets here: https://mybinder.org/v2/gh/NIDM-Terms/terms/master?filepath=utils%2Fquery_demo

I guess form our perspective we need some guidance from the BIDS steering committee about the best way to approach the previous goals? Since the BIDS community is so decentralized and big it's hard to know the best approach. One idea is for us to put together some training documents on items (1) and (2) above and circulate that to the steering committee and then, upon approval, to the broader BIDS community. Another idea is for us to formally introduce a BEP for this work and go that route.

Anyway, any guidance you can give us is appreciated.

Thanks, Dave

poldrack commented 4 years ago

I think training documents/examples for those first two points would be a really great start!

On Fri, May 1, 2020 at 12:34 PM David Keator notifications@github.com wrote:

Hi Melanie,

Thank you for the response. Initially what we're interested in is working with the BIDS community to develop methods for:

  1. Keeping our BIDS terms up to date as new terms get added from BEP work.
  2. Helping to generate tables for future BIDS specification updates where the term definitions and information is pulled from our nidm-terms work. This would be simple python code to output a table for selected terms to be incorporated into the specifications document.
  3. We work with the BEP developers to make sure they aren't re-using an existing BIDS term with a conflicting definition. We see this happen a lot in DICOM where you can't understand the meaning of a particular tag unless you know which module it came from because the same term label is used but has different meanings.
  4. For future BEP work, we'd like to provide the BIDS community with methods of adding new terms to the BIDS ecosystem by way of adding them to our nidm-terms repository. This is a work in progress but could be via github pull requests to our repo or directly in our nidm-terms InterLex repository. The nidm-terms group then does some additional annotations linking these new BIDS terms to other related terms improving our ability to automatically search across BIDS and other datasets via these term equivalencies. Further, we annotate the terms with broader concepts when appropriate for high-level search. You can see a demo of how these concept-based queries can be useful across OpenNeuro BIDS datasets here: https://mybinder.org/v2/gh/NIDM-Terms/terms/master?filepath=utils%2Fquery_demo http://url

I guess form our perspective we need some guidance from the BIDS steering committee about the best way to approach the previous goals? Since the BIDS community is so decentralized and big it's hard to know the best approach. One idea is for us to put together some training documents on items (1) and (2) above and circulate that to the steering committee and then, upon approval, to the broader BIDS community. Another idea is for us to formally introduce a BEP for this work and go that route.

Anyway, any guidance you can give us is appreciated.

Thanks, Dave

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/bids-standard/bids-specification/issues/423#issuecomment-622531274, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUVEAGLZ5OL2ULSQCKT7TRPMP5VANCNFSM4K2UCN2A .

-- Russell A. Poldrack Albert Ray Lang Professor of Psychology Professor (by courtesy) of Computer Science Bldg. 420, Jordan Hall Stanford University Stanford, CA 94305

poldrack@stanford.edu http://www.poldracklab.org/

dorahermes commented 4 years ago

Would the bids-starter-kit perhaps be a good place for the training documents/examples?

robertoostenveld commented 4 years ago

Maintaining explicit tables of terms used in BIDS would be valuable to ensure consistency and completeness. I am doing something similar for algorithm options in the FieldTrip toolbox, see here.

I think there are three types of terms to consider

  1. fields in JSON files
  2. headers in TSV files
  3. entities (key-value pairs in the file name)

For the 1st it should be possible to obtain them semi-automatically from the markdown files (grep on lines that start with a '|', followed by CamelCase and that have REQUIRED, RECOMMENDED or OPTIONAL in them).

For the 2nd it might be possible to obtain them (grep on lines that start with a '|', followed by snake_case and that have REQUIRED, RECOMMENDED or OPTIONAL in them). But I suspect this not to be complete.

For the 3rd we have done some efforts to get them in a table (via an intermediate google sheet), see https://bids-specification.readthedocs.io/en/stable/99-appendices/04-entity-table.html. I am not sure it is up to date, but suspect that @sappelhoff is keeping an eye on it.

I think that consistently formatting the specification (the markdown documents) in a way that allows automatic parsing of the .md files and extraction of the tables would be very helpful.

dorahermes commented 4 years ago

@robertoostenveld automatically parsing the specification would be. Another possible option might be to use the validator json schema files.

yarikoptic commented 4 years ago

@robertoostenveld automatically parsing the specification would be. Another possible option might be to use the validator json schema files.

wonderful idea! I wonder how much they could be generalized/centrlized (if needed at all) -- may be they should become a part of the bids-specification itself and just used by bids validator, used to produce any other relevant rendering (e.g. terms table which is hard to edit as it grows), etc? Then any changes in bids-specification itself could be just automagically picked up by the validator

satra commented 4 years ago

just a note here that in case folks did not click on links in the original post, @dbkeator created this repo: https://github.com/NIDM-Terms/terms/tree/master/terms/BIDS_Terms (but the current curation process is not very scalable, since some it involved going through the spec).

the general idea is that it should be "easy", either through a GUI or through a standard text editor to add new terms to this as they are curated/refined as part of any BEP process. the end point of where these terms sit is not decided. indeed these can then be used by the validator in a json-schema or otherwise. the key is the set of properties that are defined alongside the terms to improve searchability, not just validation.

dbkeator commented 4 years ago

Hi Folks, Thanks for all the helpful suggestions. Let me discuss the various suggestions with my team and get back to you. I think we'd like to start with items (1) and (2):

  1. Keeping our BIDS terms up to date as new terms get added from BEP work.

  2. Helping to generate tables for future BIDS specification updates where the term definitions and information is pulled from our nidm-terms work. This would be simple python code to output a table for selected terms to be incorporated into the specifications document.

@robertoostenveld Thanks for the link.
With respect to the types of terms to consider, we have started with the BIDS specification (https://github.com/NIDM-Terms/terms/tree/master/terms/BIDS_Terms) and are in the process of mining OpenNeuro so we can get a broad set of JSON fields to add to the set from the specification. Further, we have mined terms from TSV files in datasets in OpenNeuro and have begun annotating such terms with broader concepts where appropriate to facilitate query across datasets. There's a little binder demo that let's one see the concepts we've associated with the variables at the bottom of the readme here: [https://github.com/NIDM-Terms/terms]().

To further improve annotations of TSV file variables we have been developing some tools to help capture data element annotations with properties we think are important for interpreting the variables which then get written out to JSON sidecar files and also inserted into the InterLex information resource as personal data elements which should make them accessible to others who might be using the same data elements and provides a persistent url to find a good quality definition (see bidsmri2nidm [https://github.com/INCF-NIDASH/PyNIDM#bids-mri-conversion-to-nidm]()). We are currently working on a javascript UI version of these tools since the ones from PyNIDM are command line and are a bit verbose for a command line script.

With respect to adding new BIDS terms, we're thinking that could happen either through simple cloning of our nidm-terms repo and pull requests which would allow us to capture discussions surrounding such new terms or through the NIDM-Term InterLex interface "Add Term" (https://scicrunch.org/nidm-terms). We still need some work on the InterLex interface to allow folks to specify a "bucket" to put the new term into. In this example the "bucket" would be BIDS specification terms.

Cheers, Dave

angielaird commented 4 years ago

Happy to provide any CogPO-related support, if needed!

dbkeator commented 3 years ago

Hello, We have been hard at work creating a UI to interact with BIDS specification terms along with other community terminologies. Although still a bit in development you can have a look here: https://nidm-terms.github.io/.

Our hope is that folks working on BIDS extensions and/or wanting to import/export BIDS-related terms can do so with this interface and the associated nidm-terms github repository https://github.com/NIDM-Terms/terms.

Happy to discuss the new developments with the steering committee.

Cheers, Dave

CPernet commented 3 years ago

@dbkeator the site is unresponsive? I wanted to check this as well as OHBM Best Practice we also want consistency in lexicon with BIDS - but may have additional terms, so worthwhile for this purpose too -- to be used with e-COBIDAS @Remi-Gau