Closed bcorrie closed 4 months ago
When we drafted MiAIRR v1, we had a rather one-dimensional view on this, so we only captured disease_length
, as there was the (implicit) assumption, that the intervention and the subsequent sampling would be in close temporal proximity.
Having said this, I agree that the current objects are not well suited to capture more elaborate study setups. In addition, the naming of some of the keywords would be misleading, if they are used to describe interventions instead of diseases.
If we want to take a maximalist approach for v2, then my suggestions would be:
Diagnosis
object to an Event
objects, which contains a keyword type
, which in turn would be an enum including diagnosis
, intervention
and sampling
.Event
object should have a time point associated with itEvent
object of the sampling
type should reference all associated Sample
objectsEvents
from different subjects within the same study.From the call: Suggestions of @bussec capture the use-case. @bcorrie notes that IEDB has implemented a similar structure in their DB, should have a look at this. Will follow-up whether the IEDB metadata follows a third-party metadata standard or not.
Bjoern presented the more complete design that is in OBI in the image below. However, for IEDB they only implemented the subset relevant for curation which is immune exposure events
From AKC discussions, things that are events:
In AIRR at the moment, we have two explicit events where we capture a time point:
age_event
)collection_time_point_reference
) Note Diagnosis
does not have a time point or event associated with it.
It seems to me that in order for time points and events to be the most useful, ideally there should be a T0 event for a subject (or even a study) to which all time points are related. For example T0 = Enrollment in a study where all time points for all subsequent events are relative to T0 provides the most usefulness. Note that this might be different than how time points are described by the researcher in their study.
How about this for a suggestion - along the lines of our discussions with the AKC and based on @bussec suggestion here: https://github.com/airr-community/airr-standards/issues/735#issuecomment-1850297217
age_event
field)To me this seems like mostly a semantic cleanup that includes:
Yes, there are syntax changes, but fundamentally this isn't changing structure at a basic level, we still have Subject
with an array of Diagnosis
-like things. And we still track TimePoints for Subject age and when Samples are taken.
- Events have other fields for the specific type of event
This is probably the most challenging, but we can follow the direction of IEDB for the fields that they use for their exposure model for most of what we are looking for I think: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7153954/
We should consider this for v2.0 release. I don't think the above changes are huge?
I think it is important that we have a more robust solution for capturing this type of information than we currently have. So having the draft of an Event
object (which seems to be non-controversial) for the next call would be good. Volunteers? ;-)
I am smart enough to see that you just manipulated me into doing some work, but oddly not smart enough to avoid the work... 8-)
Most basic of implementations.
Event
object - it has event_id
, event_type
, and event_time_point
fields.collection_time_point_relative_*
fields with sample_collection_event
- significant overlap with #551 diagnosis_event
to Diagnosis
We could change it so that a Subject
has an array of Events
, some of which are of event_type = Diagnosis
rather than an array of Diagnosis
.
I have held off on that as that is a major change, maybe beyond the scope of v2.0?
A subject can have multiple diagnoses, and in fact diagnoses are not really diagnoses at all, since they can actually be "interventions" - for example an immunization.
Shouldn't a diagnosis have a time_point_relative and a time_point_relative_reference?