Open ghisvail opened 2 years ago
Quick suggestion that is suboptimal regarding data anonymisation:
participants.tsv
scans.tsv
or sessions.tsv
to compute the agein general, this is true not just of age but any column appearing in the phenotype directory as well that replicates elsewhere. age and sex and other demographics are often part of many common assessments. hence if there are multiple assessments of a person at different sessions, currently there is no allowance for either participant specific phenotypic information to exist inside a sessions directory.
for this specific age problem there are a couple of options:
age is recommended and not required in the participants tsv file. so if indeed age changes from session to session, one can use the sessions tsv file to reflect the age and not put it in the participants.tsv.
one can also include a variable in participants.tsv (age_at_first_session), and then have temporal offsets of session times in the sessions.tsv.
@Remi-Gau @satra thank you both for your suggestions.
Here is my attempt at summarizing your proposals in my own words and their pros and cons:
Store year_of_birth
in participants metadata and acq_time
in sessions or scans metadata. Age at session cannot be queried directly, but is computed subsequently from both attributes.
Store age
in the session metadata level and drop age
from the subject level. Age at session can be queried directly as a result, but age (at recruitment) requires access to the session-level metadata and knowledge as to which session_id
corresponds to the baseline.
Store age at baseline in subject-level metadata and temporal offsets from baseline in session-level. Imo, this proposal shares the same drawbacks as in 1, whilst being easier to interpret since age is explicit.
Please correct me if I am wrong.
On a side note, what's the rationales for forbidding inheritance for tabular metadata but allowing it for JSON metadata? I can't help but think that inheritance would have solved this in an elegant way.
- have the year of birth in the
participants.tsv
- use the acquisition time info in the
scans.tsv
orsessions.tsv
to compute the age
date/time might need to be scrubbed/anonymized . I like @satra's
one can also include a variable in participants.tsv (age_at_first_session), and then have temporal offsets of session times in the sessions.tsv.
Chiming in since i'm dealing with constantly changing age values working with multi-session infant data. Overall I think it would help with readability / potential errors to avoid a calculation of start value + offset
to get session level metadata.
On a side note, what's the rationales for forbidding inheritance for tabular metadata but allowing it for JSON metadata? I can't help but think that inheritance would have solved this in an elegant way.
I would also like to know, as my first thought was the sessions.tsv
could override conflicting values present in the participants.tsv
.
Also, is sessions.json
(a sidecar similar to participants.json
) a BIDS valid file? I don't see any mention currently in the spec but there must be a place to define non-standard sessions.tsv
columns.
Also, is
sessions.json
(a sidecar similar toparticipants.json
) a BIDS valid file? I don't see any mention currently in the spec but there must be a place to define non-standardsessions.tsv
columns.
Yes, sub-<label>_sessions.json
does exist and serves a similar puprose as for the participants sidecar applied to session-level tabular data.
Chiming in since i'm dealing with constantly changing age values working with multi-session infant data. Overall I think it would help with readability / potential errors to avoid a calculation of
start value + offset
to get session level metadata.
Sounds brutal to deal with. I'd say in your case, you'd want to store age
at the session level and have a separate age_at_recruitment
at the participant level.
What solution have you adopted so far?
The current solution is to just process 1 session at a time, specifying the age (in months) through a flag. This isn't very scalable though - I was thinking of something along these lines:
1) Check sessions.tsv
for age
1) If not found, check participants.tsv
1) If still not found, require flag
(link to initial issue https://github.com/nipreps/nibabies/issues/75#issuecomment-865199930)
I came across the following paragraph in the spec:
Age SHOULD be given as the number of years since birth at the time of scanning (or first scan in case of multi session datasets). Using higher accuracy (weeks) should in general be avoided due to privacy protection, unless when appropriate given the study goals, for example, when scanning babies.
So it appears there is some official recommendations towards keeping age
for the participant level even in a multi-session dataset. In this context, I guess a separate age_at_session
would make sense.
Not too much to contribute, but I would like to highlight this comment from @satra
this is true not just of age
e.g., "sex" is described as "phenotypical sex", but whether that means something like "assigned at birth" vs "as observed by the researcher" vs "as reported by the participant during the first session" is unclear, and a few interpretations may vary across sessions.
So, it would be nice for the solution to not rely too much on age-specific mechanics (e.g., Date of Birth + offset) and instead favor something general (e.g., preferring or searching first for columns in sessions.tsv).
More of a question first, which may lead to a proposal later.
Is there a consensus on how to represent the concept of an age at session in the BIDS metadata hierarchy?
The context concerns longitudinal studies where imaging visits span a large period of time (several years typically), and the age at session is necessary for modeling disease progression.
On first read, I thought this attribute may go in
sub-<label>_sessions.tsv#age
which would overrideparticipants.tsv#age
as per the inheritance rule. But that contradicts the rule in the sessions file section mandating the use of distinct attributes between participants and sessions tabular data.A BIDS-compliant alternative could be to introduce an
age_at_session
attribute in the sessions file.What are your thoughts?