Closed bcorrie closed 4 months ago
@bcorrie Is there any special reason why the non-Subject
objects contain an actual Event
record, instead of a reference (ID) to a record located in an array in Subject
?
This is a minimal change to make use of the Event object in the current structure that we have. No real semantic or structural changes, just syntax changes so that we are using the Event
object everywhere an "event" occurred in our current specification.
With this specification, we still have an array of Diagnosis
where each Diagnosis
has an Event
that captures the Event
-like features of a diagnosis. We do not have a general array of Events
in Subject
.
To do this correctly, with an array of Events
, I think is beyond the scope of v2.0. In talking with @schristley for the AKC, it makes more sense for this to wait until v2.1.
So for v2.0 I would suggest we either make some simple changes to introduce the Event
concept in v2.0 similar to what we have now, or if we want to consider the redesign of this we should move this issue out of the v2.0 milestone and into v2.1
I am fine with either. The nice thing about doing this now is that we add Event
information like TimePoint
to Diagnosis
, which is currently missing (that was the original issue that was driving this change). But there is a good argument for implementing Event
once and doing it correctly - which I don't think fits in the scope of v2.0.
I agree that the "maximalist" version of this is a substantial change and that it will require time and effort to get it done correctly. However, if we take the meaning of a major version number seriously, then IMO it should go into v2.0 exactly for that reason and not into a later point release. If we keep moving more and more stuff to v2.1, we might be able to release v2.0 in time, but will not meet the goal of having only non-major changes afterwards. In the end we need to agree on what we want/need to complete before we are "done" with the schema and what our timeline is. I have added this topic to the agenda of our next call.
Yep, but at the same time, if we keep adding major new things we will never release 2.0.
I think the more we see things being pushed to 2.1 the less I am convinced that "having only non-major changes afterwards" might not be practical - unless you are willing to accept that the current WG will be around for another year - and the v2.0 release is now targeted at the 2025 meeting 8-)
Then I can stop pushing so hard to get things done for June!!!
I don't think making a schema change at the last minute is a good idea. The Event concept needs more discussion and how it fits into an overall enhancement of study design curation. In particular because it causes disruption, i.e. code has to change.
I agree that the "maximalist" version of this is a substantial change and that it will require time and effort to get it done correctly. However, if we take the meaning of a major version number seriously, then IMO it should go into v2.0 exactly for that reason and not into a later point release.
That's not how I understand the major version number. A change in the major version number implies new requirements on researchers for MiAIRR for AIRR-seq studies. We don't want to change the MiAIRR standard at the last moment without putting in the time and effort for discussion. Right now V2 is about adding single-cell and a few other things to MiAIRR, that seems sufficient for now.
If we keep moving more and more stuff to v2.1, we might be able to release v2.0 in time, but will not meet the goal of having only non-major changes afterwards. In the end we need to agree on what we want/need to complete before we are "done" with the schema and what our timeline is. I have added this topic to the agenda of our next call.
I'm not sure we will ever be "done". As a domain-specific standards group, we need to keep up with the experimental protocols and other changes in the domain. The enhancement of study design curation, and other stuff, that will come out of AKC and NFDI4Immuno will likely feed back into MiAIRR V3 though it will take a few years.
So we are back to the question what is in v2.0 and what is for later discussion.
Currently we have a minimalist change (https://github.com/airr-community/airr-standards/pull/749/files):
diagnosis_event
field of type Event
that helps capture when a diagnosis happened. This did not exist before, there was no timepoint associated with a Diagnosis
(See #735 )collection_time_point_*
to be a collection_event
These are relatively minor changes that clean up several issues in the original spec.
There are no major changes here IMO, and I would lean towards doing this for v2.0 (with minor extra tweaks) and leave the "re-engineering" of the concept of events in a study to a later discussion/release.
I consider this a fairly major semantic change. Adding a time point seems reasonable, but inserting the Event object in the middle will create a lot of churn when we change things later.
leave the "re-engineering" of the concept of events in a study to a later discussion/release.
IMO it doesn't. If forces the Event object to be used now in all the code. Every data entry screen has to be written to capture an "Event" instead of just recording a time point. Every python script and analysis tool has to write code that traverses through that "Event" in order to access a time point. Every query has to be written to traverse through that "Event" object to operate on the time point. And if all of that gets done, then we will be stuck with this structure because changing it is too much work. I can already see you writing the post later about how it breaks backward compatibility ;-)
I'm not opposed to adding an Event object. I would just prefer it be added in a way that doesn't require mandatory usage now with all the associated data and code changes. Even more so if this is just considered preliminary and we expect the design to change later.
Another idea which will create less churn is to put the Event
object inside TimePoint
, that is flip the structure, so the new fields in Diagnosis are TimePoint
. Here we are just reversing the relationship, instead of saying an Event occurs at TimePoint, we are saying TimePoint is associated with Event. This would allow usage of the time point without requiring an event object to always be in the middle.
I am also OK to move this entire change out of v2.0 - I see your point if it is just going to change in the next version why do work now that will be thrown away in the near future.
So I would be OK with moving this out of the v2.0 milestone completely.
Remember this all started because Diagnosis
doesn't have a TimePoint
. It hasn't had a time point for 6 years (since MiAIRR was introduced) and it hasn't appeared to be a problem to anyone up until now except to us, so I am OK for it NOT to have a time point for another year. Especially if this means we get the v2.0 release out the door. 8-)
Another idea which will create less churn is to put the
Event
object insideTimePoint
, that is flip the structure, so the new fields in Diagnosis areTimePoint
. Here we are just reversing the relationship, instead of saying an Event occurs at TimePoint, we are saying TimePoint is associated with Event. This would allow usage of the time point without requiring an event object to always be in the middle.
The easiest thing to do is just add a TimePoint
to the Diagnosis
object. This doesn't even break backwards compatibility, it just adds a field. We can revert the changes to collection_time_point_*
and drop Event
for now.
This actually fixes #735 without changing anything else.
@bussec any more thoughts. I am not going to do any more edits on this until we decide what we actually want for v2.0... 8-)
@bcorrie I am ok with only doing small changes until mid-2024 (whatever that version will be called). Right now I have no opinion whether a TimePoint
should have Events
or an Event
should have a TimePoint
, but I agree with @schristley that if we only want to do small changes now, the former one is clearly (plus addition of a TimePoint
to Diagnosis
) is probably the least invasive thing to do.
Having said this, I still think that the whole "description-of-what-happend-when" package (including issues like #486 ) needs to be addressed before we go on hiatus, as it substantially limits the usefulness of the AIRR Schema if we don't.
@bussec @schristley I have reverted the Event changes and added a TimePoint to Diagnosis. This resolves #735 and we have agreed to do the big "event" lift after 2.0. So this can be merged I think.
Resolved merge conflicts. @schristley are we good to go?
@bcorrie @bussec Minor point, but the test code creates a template repertoire object and validates it, so it will already test null
for that field. My suggestion is that you put the example values into the "good" test data files. Otherwise it looks good to go.
Added tests for proper validation for R and python as per @schristley request above. Merging...
closes #735