airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

physical quantity format #551

Closed schristley closed 6 months ago

schristley commented 2 years ago

fix #504 fix #500 fix #498

schristley commented 2 years ago

@bussec My initial attempt just in the openapi3 schema file

schristley commented 2 years ago

@javh age, that which was deprecated becomes reborn. I hope we don't need to specify a deprecation of deprecation?!

bussec commented 2 years ago

@schristley Looks good to me, as long as the point/range distinction does not create a major hazzle for search routines.

bcorrie commented 2 years ago

My only concern would be the same as @bussec. See #504

schristley commented 2 years ago

The other thing I don't quite like is having the unit twice for a range, what if people start doing silly stuff like make one unit year and another month. Having to do conversions on queries sounds complicated... I'd much prefer a TimeRange object, similar to TimePoint, with a single unit.

schristley commented 2 years ago

@bcorrie time interval it is, how does this look? I went with TimeInterval instead of TimeRange as I think it's more descriptive.

schristley commented 2 years ago

as a side note, TimeInterval is relative, i.e. not fixed to a specific date. If we find we need to express such intervals in the future then we can create a DateInterval where the min and max are datetime

scharch commented 1 year ago

Leaving the referenced issues open under the assumption that this will eventually be merged....

javh commented 11 months ago

From the call:

schristley commented 10 months ago

@javh I certainly hope this can be done for AIRR 2.0. The current design of having separate, unconnected fields (except through a semantic interpretation of the names), is very TSV-ish. I think we should be more JSON-ish by compositing fields together into an object that semantically belong together.

schristley commented 9 months ago

The check error will be fixed with #739

bcorrie commented 8 months ago

I also think:

bcorrie commented 8 months ago

I think this would also resolve #689

bcorrie commented 8 months ago

@schristley if no objections I will make the Sample.collection_time_point and ReceptorReactivity.reactivity changes.

schristley commented 8 months ago

@schristley if no objections I will make the Sample.collection_time_point and ReceptorReactivity.reactivity changes.

@bcorrie sure, sounds good

schristley commented 7 months ago

Hey @bcorrie , I can see you are trying to clean up this PR, let me know if I can help in any way?

bcorrie commented 7 months ago

Yes, I was working through it. Thanks for the test data fixes, I gave up last night and was going to dive back in this morning. You know the test suites better than I do clearly.

Looks like the checks have passed.

bcorrie commented 7 months ago

Only outstanding issue on this is for CellReactivity.reactivity_value should be a PhysicalQuantity which is being covered in https://github.com/airr-community/airr-standards/pull/705#issuecomment-1930607761

So I think this can be merged. @javh and @bussec were both reviewers for this previously so I have left you on if you want to have a look.

bcorrie commented 7 months ago

In #646 we discuss whether collection_time_point_relative should be a TimeInterval.

It also raises the question should we have a QuantityInterval or QuantityRange object. We can define it but not use it??? Or should all of our PhyscialQuantity objects be QuantityRange objects?

Not quite done with this yet. 8-)

bcorrie commented 7 months ago

In #689 we talk about using the label field of TimePoint in collection_time_point_relative so that we no longer need collection_time_point_reference. Is this desirable?

schristley commented 7 months ago

In #646 we discuss whether collection_time_point_relative should be a TimeInterval.

It also raises the question should we have a QuantityInterval or QuantityRange object. We can define it but not use it??? Or should all of our PhyscialQuantity objects be QuantityRange objects?

Not quite done with this yet. 8-)

Upscale quantity fields to allow either a range or point, like with age...

I'm hesitant to do that until we actually have a use case that requires it. For some quantities, the idea of measuring a range might not be semantic viable. And it has the implication that downstream analysis will always have to handle the fields as ranges in order to do calculations properly, which introduces complexity.

bcorrie commented 6 months ago

@bussec @javh I think this is OK to merge. It would be nice to have this merged before any Event stuff gets added, as Event extends/relies on some of these changes. Needs your review (or remove your self from the review list if you would prefer not to 8-)

bcorrie commented 6 months ago

@schristley and I both did this in parallel - I think I merged them OK. I added a couple of adc_query_support attributes that were missing.

bcorrie commented 6 months ago

@bussec remaining issue from above is disease_length, I believe it should be a PhysicalQuantity as I don't think a TimeInterval makes sense since it is a length not a time span.

schristley commented 6 months ago

@bussec remaining issue from above is disease_length, I believe it should be a PhysicalQuantity as I don't think a TimeInterval makes sense since it is a length not a time span.

I agree it should be PhysicalQuantity because it is a length or duration. If you go to a TimeInterval then what value would you put for min, why would it be anything other than zero? If min is zero, it's the same result as PhysicalQuantity, while if it's non-zero, then what is it relative to?

With patient data, we don't want to use dates, so what we are seeing with the AKC work (and you see this when getting clinical data), is that there is a defined T=0 event, and then everything after that is a relative time point. But we don't have that in AIRR yet.

In the future redesign there will be time points for specific events, so the description, "Time duration between initial diagnosis and current intervention" would suggest an event at "initial diagnosis" and an event at "current intervention" and this length could be computed from those.

bussec commented 6 months ago

@bussec remaining issue from above is disease_length, I believe it should be a PhysicalQuantity as I don't think a TimeInterval makes sense since it is a length not a time span.

@bcorrie I agree that TimeInterval (or TimePoint) sound odd, but they would enforce the correct units for a measurement of time, which PhysicalQuantity does not. It is probably more a question how we call the Time* objects, but we can worry about that another day as the current solution does not create any limits here. So no objections from my side.

schristley commented 6 months ago

@bussec remaining issue from above is disease_length, I believe it should be a PhysicalQuantity as I don't think a TimeInterval makes sense since it is a length not a time span.

@bcorrie I agree that TimeInterval (or TimePoint) sound odd, but they would enforce the correct units for a measurement of time, which PhysicalQuantity does not. It is probably more a question how we call the Time* objects, but we can worry about that another day as the current solution does not create any limits here. So no objections from my side.

I see. Yes, you are right, I missed that.

So we need to fix this because PhysicalQuantity has the top ontology node as mass unit and thus technically we are not allowed to specify time units with it.

bcorrie commented 6 months ago

Do we want to add a TimeQuantity object - to enforce units. Or just have a general Quantity object with a higher root node? Or just change the current root node and leave it as a PhysicalQuantity.

I would be tempted to change the name to Quantity and change the root node to UO:0000000, unit

@bussec @schristley thoughts?

schristley commented 6 months ago

I think we are pretty much required to have a TimeQuantity object to enforce time units. If we change the root to UO:0000000, unit then that will allow all types of incorrect units to be used.

An interesting aside, LinkML has slot_usage which allows for this. That is, we could define a general Quantity but then indicate when used as disease_length that its range is a time unit, while other usages of the slot might have range as mass units, and so on. I would guess how this gets translated into JSON schema ends up being not much different from what we are doing manually.

bussec commented 6 months ago

+1 for introducing a TimeQuantity object.

bcorrie commented 6 months ago

I will work on this...

lgcowell commented 6 months ago

[heart] Lindsay Cowell reacted to your message:

lgcowell commented 6 months ago

[0] Lindsay Cowell reacted to your message:

bcorrie commented 6 months ago

This has been done - ready for merge, unless we can find anything else...