bids-standard / bids-specification

Brain Imaging Data Structure (BIDS) Specification
https://bids-specification.readthedocs.io/
Creative Commons Attribution 4.0 International
273 stars 156 forks source link

allow Levels values (in sidecar for TSV files) to be objects #1573

Closed Remi-Gau closed 1 year ago

Remi-Gau commented 1 year ago

Your idea

was raised during a meeting at OHBM that some project invovled with phenotypic annotation (neurobagel) may greatly benefit if the data type for TermURL (in the sidecar json of TSV files) could be a json object instead of just string.

Opening an issue here to to start the conversation.

@ericearl @jbpoline @nikhil153 @surchs @michellewang @bcmcpher

Remi-Gau commented 1 year ago

@surchs

Could you provide an example of what the object could look like?

ericearl commented 1 year ago

@Remi-Gau Would a JSON or a HED object do alright?

surchs commented 1 year ago

Hey, sorry for the delay here - and thanks @Remi-Gau for creating the issue!

As far as I understand, the "Levels" key in e.g. a participants.json sidecar can only have 'string' values (https://bids-specification.readthedocs.io/en/stable/glossary.html#levels-metadata) like described in the docs:

{
  "MeasurementToolMetadata": {
    "Description": "Adult ADHD Clinical Diagnostic Scale V1.2",
    "TermURL": "https://www.cognitiveatlas.org/task/id/trm_5586ff878155d"
  }
  "adhd_c_dx": {
    "Description": "As child met A, B, C, D, E and F diagnostic criteria",
    "Levels": {
      "1": "YES",
      "2": "NO"
    }
  }
}

The problem is that this only allows us to put a human readable description (e.g. "1": "YES" OR a machine readable unique identifier there (e.g. "1": "https://myhappyontology.org/vocab/controlled_term_yes"). But both are important.

So what we would like to propose is to turn the allowed values for "Levels" keys from "only string" to "either string or object". Then we could do something like this:

{
  "MeasurementToolMetadata": {
    "Description": "Adult ADHD Clinical Diagnostic Scale V1.2",
    "TermURL": "https://www.cognitiveatlas.org/task/id/trm_5586ff878155d"
  }
  "adhd_c_dx": {
    "Description": "As child met A, B, C, D, E and F diagnostic criteria",
    "Levels": {
      "1": {
          "TermURL": "https://myhappyontology.org/vocab/controlled_term_yes",
          "Label": "Yes"
      },
      "2": "NO"
    }
  }
}

so that machines and humans can coexist peacefully.

For now in neurobagel, we have instead created a new key "Annotations" where we stick information in this format (see e.g. here) but we would prefer to remove the "Annotations" key again and instead encode everything directly in the normal BIDS "Levels" key.

surchs commented 1 year ago

Also pinging @alyssadai here because she is also working on our current specification for these semantically annotated BIDS sidecar files.

Remi-Gau commented 1 year ago

@Remi-Gau Would a JSON or a HED object do alright?

Good catch @ericearl given that this could potentially affect all TSV in bids it would be good to make sure that this plays well with anything that is coupled with them.

@VisLab what would be your take on having annotations of levels in TSV files that match what the suggestion from @surchs above?

as far as I can tell this would cause any problem with HED but maybe I am missing something.

effigies commented 1 year ago

Worth noting that the Motion BEP is moving some of its metadata into levels objects, so there's other demand for this (#1524/#1591).

I'm +1 on relaxing Levels to be an object or string, where a string should be interpreted as shorthand for {"Description": <string>}. Reusing TermURL makes sense to me. Do we also need Label, or would reusing Description do the job?

surchs commented 1 year ago

Do we also need Label, or would reusing Description do the job?

Description is good, we just used Label because that's the usual term in the graph. If TermURL and Description disagree (e.g. because I have manually edited one but not the other), can we say that TermURL wins? Not sure if there is something in BIDS we could reuse?

effigies commented 1 year ago

Description is free-form, so intended for communication to a human reader. I would say it's reasonable for a tool to fetch TermURL and find its display value there.