airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

Why doesn't Diagnosis have a time_point associated with it? #735

Closed bcorrie closed 4 months ago

bcorrie commented 7 months ago

A subject can have multiple diagnoses, and in fact diagnoses are not really diagnoses at all, since they can actually be "interventions" - for example an immunization.

Shouldn't a diagnosis have a time_point_relative and a time_point_relative_reference?

bussec commented 6 months ago

When we drafted MiAIRR v1, we had a rather one-dimensional view on this, so we only captured disease_length, as there was the (implicit) assumption, that the intervention and the subsequent sampling would be in close temporal proximity.

Having said this, I agree that the current objects are not well suited to capture more elaborate study setups. In addition, the naming of some of the keywords would be misleading, if they are used to describe interventions instead of diseases.

If we want to take a maximalist approach for v2, then my suggestions would be:

bussec commented 6 months ago

From the call: Suggestions of @bussec capture the use-case. @bcorrie notes that IEDB has implemented a similar structure in their DB, should have a look at this. Will follow-up whether the IEDB metadata follows a third-party metadata standard or not.

schristley commented 6 months ago

Bjoern presented the more complete design that is in OBI in the image below. However, for IEDB they only implemented the subset relevant for curation which is immune exposure events

Screenshot 2023-12-11 at 4 23 40 PM
bcorrie commented 6 months ago

From AKC discussions, things that are events:

In AIRR at the moment, we have two explicit events where we capture a time point:

Note Diagnosis does not have a time point or event associated with it.

bcorrie commented 6 months ago

It seems to me that in order for time points and events to be the most useful, ideally there should be a T0 event for a subject (or even a study) to which all time points are related. For example T0 = Enrollment in a study where all time points for all subsequent events are relative to T0 provides the most usefulness. Note that this might be different than how time points are described by the researcher in their study.

bcorrie commented 6 months ago

How about this for a suggestion - along the lines of our discussions with the AKC and based on @bussec suggestion here: https://github.com/airr-community/airr-standards/issues/735#issuecomment-1850297217

  1. We create an Event object
    • Event objects have an EventType which is a enum (Diagnosis, Intervention, Exposure, Sampling, Enrollment, Age ...)
    • Events have TimePoints - see #689 (time with units + relative to field as we have now)
    • Events have other fields for the specific type of event
  2. Subjects have an array of Events (replaces our current Diagnosis with its hodge podge of extra fields)
  3. Each Sample has an Event that delineates when the sample was taken (replaces collection_time_point* fields)
  4. A Subject has an Event that delineates when the subject's age was documented (replaces age_event field)

To me this seems like mostly a semantic cleanup that includes:

Yes, there are syntax changes, but fundamentally this isn't changing structure at a basic level, we still have Subject with an array of Diagnosis-like things. And we still track TimePoints for Subject age and when Samples are taken.

bcorrie commented 6 months ago
  • Events have other fields for the specific type of event

This is probably the most challenging, but we can follow the direction of IEDB for the fields that they use for their exposure model for most of what we are looking for I think: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7153954/

bcorrie commented 5 months ago

We should consider this for v2.0 release. I don't think the above changes are huge?

bussec commented 4 months ago

I think it is important that we have a more robust solution for capturing this type of information than we currently have. So having the draft of an Event object (which seems to be non-controversial) for the next call would be good. Volunteers? ;-)

bcorrie commented 4 months ago

I am smart enough to see that you just manipulated me into doing some work, but oddly not smart enough to avoid the work... 8-)

bcorrie commented 4 months ago

Most basic of implementations.

bcorrie commented 4 months ago

We could change it so that a Subject has an array of Events, some of which are of event_type = Diagnosis rather than an array of Diagnosis.

I have held off on that as that is a major change, maybe beyond the scope of v2.0?