Why doesn't Diagnosis have a time_point associated with it?

bcorrie commented 7 months ago

A subject can have multiple diagnoses, and in fact diagnoses are not really diagnoses at all, since they can actually be "interventions" - for example an immunization.

Shouldn't a diagnosis have a time_point_relative and a time_point_relative_reference?

bussec commented 6 months ago

When we drafted MiAIRR v1, we had a rather one-dimensional view on this, so we only captured disease_length, as there was the (implicit) assumption, that the intervention and the subsequent sampling would be in close temporal proximity.

Having said this, I agree that the current objects are not well suited to capture more elaborate study setups. In addition, the naming of some of the keywords would be misleading, if they are used to describe interventions instead of diseases.

If we want to take a maximalist approach for v2, then my suggestions would be:

restructure the Diagnosis object to an Event objects, which contains a keyword type, which in turn would be an enum including diagnosis, intervention and sampling.
every Event object should have a time point associated with it
an Event object of the sampling type should reference all associated Sample objects
we might want to think about an additional object that allows the general description of a study design. This would allow for better comparability of Events from different subjects within the same study.

bussec commented 6 months ago

From the call: Suggestions of @bussec capture the use-case. @bcorrie notes that IEDB has implemented a similar structure in their DB, should have a look at this. Will follow-up whether the IEDB metadata follows a third-party metadata standard or not.

schristley commented 6 months ago

Bjoern presented the more complete design that is in OBI in the image below. However, for IEDB they only implemented the subset relevant for curation which is immune exposure events

bcorrie commented 6 months ago

From AKC discussions, things that are events:

Enrollment in a study
Diagnosis
Intervention - vaccine,
Exposure - exposure to a pathogen
Sampling

In AIRR at the moment, we have two explicit events where we capture a time point:

We capture a time point in the subject, associated with age (age is measured relative to an explicit event - field age_event)
We capture a time point that is associated with when the sample was taken (collection time point is relative to an event - field collection_time_point_reference)

Note Diagnosis does not have a time point or event associated with it.

bcorrie commented 6 months ago

It seems to me that in order for time points and events to be the most useful, ideally there should be a T0 event for a subject (or even a study) to which all time points are related. For example T0 = Enrollment in a study where all time points for all subsequent events are relative to T0 provides the most usefulness. Note that this might be different than how time points are described by the researcher in their study.

bcorrie commented 6 months ago

How about this for a suggestion - along the lines of our discussions with the AKC and based on @bussec suggestion here: https://github.com/airr-community/airr-standards/issues/735#issuecomment-1850297217

We create an Event object
- Event objects have an EventType which is a enum (Diagnosis, Intervention, Exposure, Sampling, Enrollment, Age ...)
- Events have TimePoints - see #689 (time with units + relative to field as we have now)
- Events have other fields for the specific type of event
Subjects have an array of Events (replaces our current Diagnosis with its hodge podge of extra fields)
Each Sample has an Event that delineates when the sample was taken (replaces collection_time_point* fields)
A Subject has an Event that delineates when the subject's age was documented (replaces age_event field)

To me this seems like mostly a semantic cleanup that includes:

a clear definition of what an event is
a clear definition of what objects events are associated with

Yes, there are syntax changes, but fundamentally this isn't changing structure at a basic level, we still have Subject with an array of Diagnosis-like things. And we still track TimePoints for Subject age and when Samples are taken.

bcorrie commented 6 months ago

Events have other fields for the specific type of event

This is probably the most challenging, but we can follow the direction of IEDB for the fields that they use for their exposure model for most of what we are looking for I think: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7153954/

bcorrie commented 5 months ago

We should consider this for v2.0 release. I don't think the above changes are huge?

bussec commented 4 months ago

I think it is important that we have a more robust solution for capturing this type of information than we currently have. So having the draft of an Event object (which seems to be non-controversial) for the next call would be good. Volunteers? ;-)

bcorrie commented 4 months ago

I am smart enough to see that you just manipulated me into doing some work, but oddly not smart enough to avoid the work... 8-)

bcorrie commented 4 months ago

Most basic of implementations.

Created an Event object - it has event_id, event_type, and event_time_point fields.
Replaced collection_time_point_relative_* fields with sample_collection_event - significant overlap with #551
Added a diagnosis_event to Diagnosis

bcorrie commented 4 months ago

We could change it so that a Subject has an array of Events, some of which are of event_type = Diagnosis rather than an array of Diagnosis.

I have held off on that as that is a major change, maybe beyond the scope of v2.0?

airr-community / airr-standards

Why doesn't Diagnosis have a time_point associated with it? #735