bids-standard / bids-specification

Brain Imaging Data Structure (BIDS) Specification
https://bids-specification.readthedocs.io/
Creative Commons Attribution 4.0 International
280 stars 165 forks source link

Age at session #1020

Open ghisvail opened 2 years ago

ghisvail commented 2 years ago

More of a question first, which may lead to a proposal later.

Is there a consensus on how to represent the concept of an age at session in the BIDS metadata hierarchy?

The context concerns longitudinal studies where imaging visits span a large period of time (several years typically), and the age at session is necessary for modeling disease progression.

On first read, I thought this attribute may go in sub-<label>_sessions.tsv#age which would override participants.tsv#age as per the inheritance rule. But that contradicts the rule in the sessions file section mandating the use of distinct attributes between participants and sessions tabular data.

A BIDS-compliant alternative could be to introduce an age_at_session attribute in the sessions file.

What are your thoughts?

Remi-Gau commented 2 years ago

Quick suggestion that is suboptimal regarding data anonymisation:

satra commented 2 years ago

in general, this is true not just of age but any column appearing in the phenotype directory as well that replicates elsewhere. age and sex and other demographics are often part of many common assessments. hence if there are multiple assessments of a person at different sessions, currently there is no allowance for either participant specific phenotypic information to exist inside a sessions directory.

for this specific age problem there are a couple of options:

  1. age is recommended and not required in the participants tsv file. so if indeed age changes from session to session, one can use the sessions tsv file to reflect the age and not put it in the participants.tsv.

  2. one can also include a variable in participants.tsv (age_at_first_session), and then have temporal offsets of session times in the sessions.tsv.

ghisvail commented 2 years ago

@Remi-Gau @satra thank you both for your suggestions.

Here is my attempt at summarizing your proposals in my own words and their pros and cons:

  1. Store year_of_birth in participants metadata and acq_time in sessions or scans metadata. Age at session cannot be queried directly, but is computed subsequently from both attributes.

  2. Store age in the session metadata level and drop age from the subject level. Age at session can be queried directly as a result, but age (at recruitment) requires access to the session-level metadata and knowledge as to which session_id corresponds to the baseline.

  3. Store age at baseline in subject-level metadata and temporal offsets from baseline in session-level. Imo, this proposal shares the same drawbacks as in 1, whilst being easier to interpret since age is explicit.

Please correct me if I am wrong.

On a side note, what's the rationales for forbidding inheritance for tabular metadata but allowing it for JSON metadata? I can't help but think that inheritance would have solved this in an elegant way.

yarikoptic commented 2 years ago
  • have the year of birth in the participants.tsv
  • use the acquisition time info in the scans.tsv or sessions.tsv to compute the age

date/time might need to be scrubbed/anonymized . I like @satra's

one can also include a variable in participants.tsv (age_at_first_session), and then have temporal offsets of session times in the sessions.tsv.

mgxd commented 2 years ago

Chiming in since i'm dealing with constantly changing age values working with multi-session infant data. Overall I think it would help with readability / potential errors to avoid a calculation of start value + offset to get session level metadata.

On a side note, what's the rationales for forbidding inheritance for tabular metadata but allowing it for JSON metadata? I can't help but think that inheritance would have solved this in an elegant way.

I would also like to know, as my first thought was the sessions.tsv could override conflicting values present in the participants.tsv.

Also, is sessions.json (a sidecar similar to participants.json) a BIDS valid file? I don't see any mention currently in the spec but there must be a place to define non-standard sessions.tsv columns.

ghisvail commented 2 years ago

Also, is sessions.json (a sidecar similar to participants.json) a BIDS valid file? I don't see any mention currently in the spec but there must be a place to define non-standard sessions.tsv columns.

Yes, sub-<label>_sessions.json does exist and serves a similar puprose as for the participants sidecar applied to session-level tabular data.

ghisvail commented 2 years ago

Chiming in since i'm dealing with constantly changing age values working with multi-session infant data. Overall I think it would help with readability / potential errors to avoid a calculation of start value + offset to get session level metadata.

Sounds brutal to deal with. I'd say in your case, you'd want to store age at the session level and have a separate age_at_recruitment at the participant level.

What solution have you adopted so far?

mgxd commented 2 years ago

The current solution is to just process 1 session at a time, specifying the age (in months) through a flag. This isn't very scalable though - I was thinking of something along these lines:

1) Check sessions.tsv for age 1) If not found, check participants.tsv 1) If still not found, require flag

(link to initial issue https://github.com/nipreps/nibabies/issues/75#issuecomment-865199930)

ghisvail commented 2 years ago

I came across the following paragraph in the spec:

Age SHOULD be given as the number of years since birth at the time of scanning (or first scan in case of multi session datasets). Using higher accuracy (weeks) should in general be avoided due to privacy protection, unless when appropriate given the study goals, for example, when scanning babies.

So it appears there is some official recommendations towards keeping age for the participant level even in a multi-session dataset. In this context, I guess a separate age_at_session would make sense.

psadil commented 1 month ago

Not too much to contribute, but I would like to highlight this comment from @satra

this is true not just of age

e.g., "sex" is described as "phenotypical sex", but whether that means something like "assigned at birth" vs "as observed by the researcher" vs "as reported by the participant during the first session" is unclear, and a few interpretations may vary across sessions.

So, it would be nice for the solution to not rely too much on age-specific mechanics (e.g., Date of Birth + offset) and instead favor something general (e.g., preferring or searching first for columns in sessions.tsv).