airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

Receptor object issues when used in real life... #704

Closed bcorrie closed 4 months ago

bcorrie commented 11 months ago

In trying to load epitope specificity, we have found a few problems. In discussions with @bussec we have come up with a set of suggested changes. Please discuss 8-)

bcorrie commented 11 months ago

Following suggestions for discussion:

bcorrie commented 11 months ago

ReceptorReactivity needs to be associated with both Cells and Receptors. The actual measurement is an observation from an experiment that links a specific Cell to a specific Epitope (at least in the case of a 10X study). That Cell can be associated with a Receptor. In that case, the ReceptorReactivity is also evidence of an association between the more global Receptor so the ReceptorReactivity should be associated with the Receptor as well.

In our current standard, there is currently no way to link a ReceptorActivity observation with a specific Cell. Since Receptors might have many ReceptorReactivity values, and they may come from many Cells from many experiments, it is currently not possible to determine which Cell a ReceptorReactivity came from.

Hence the addition of Cell.reactivity_measurements

bcorrie commented 11 months ago

@bussec @kira-neller does that capture our discussion?

kira-neller commented 11 months ago

@bcorrie Yes, this captures the necessary changes as far as I understand. Thank you!

bussec commented 11 months ago

@bcorrie Yes, it captures our discussion.

However, I came across an additional complication that we to think about: If we reference to ReceptorReactivity records by their ID only (i.e., they are not nested into the Receptor object), then you don't know which receptor the reactivity measurement refers to. From a Cell record you could reconstruct this using both the receptors and the reactivity_measurements properties (which is already a bit of a pain), but for other potential references you would need to search all Receptor records for a matching reactivity measurement ID. Therefore ReceptorReactity needs to contain the receptor_hash.

Ruminating about this, there is the also the situation that we discussed in which a cell expresses more than one receptor. If in such a case you have data from an multimer-MHC binding assay, you won't be able to know which receptor mediated the binding. Therefore the ReceptorReactivity record would need to refer to multiple receptor_hash IDs. Which is not a problem by itself, but we need to clearly document that in such a case the respective receptor might have been involved, but you cannot be certain about it.

bcorrie commented 9 months ago

However, I came across an additional complication that we to think about: If we reference to ReceptorReactivity records by their ID only (i.e., they are not nested into the Receptor object), then you don't know which receptor the reactivity measurement refers to

I am not sure this is true is it... The Receptor object has a list of ReceptorReactivity IDs in it in the reactivity_measurements array. So you can find all ReceptorActivity entities for a Receptor by looking at Receptor.reactivity_measurements and you can find which Receptor a ReceptorReactivity entity refers to by searching all Receptor objects for the recep[tor_activity_id in the Receptor.reactivity_measurements. So you can find the receptor with a relatively expensive query...

This is actually fairly cumbersome, and would be more elegant I think if the ReceptorReactivity object pointed directly to the Receptor object. We discussed this and for some reason decided that an array of Receptor.reactivity_measurements was better. I can't remember why and I am not sure that was the right choice... 8-)

bcorrie commented 9 months ago

Isn't it always true that a single ReceptorReactivity instance comes from one, and only one Cell (the measurement for reactivity comes from a single cell, no?) and in your above scenario might point to a very small number of Receptors (e.g. the Cell expresses more than one Receptor and you don't know which Receptor is causing the reactivity)?

Maybe we should have ReceptorReactivity having a cell_id field and an array of receptor_id field (where the array would have small N 1-2?). We could then get rid of the reactivity_measurement fields and if you wanted to find all of the ReactivityMeasurement fields associated with a Cell or Receptor, you search ReceptorReactity objects for the cell_id or receptor_id of interest.

bcorrie commented 7 months ago

@bussec any comments on this. Would be good to close this off and merge with master.

bcorrie commented 5 months ago

Adding the other use case as per recent discussions in #705 for ReceptorReactivity

In my single-cell study stored in the ADC, I have found the following:

So I get something like this:

If I want to track the known ReceptorReactivity of the Receptor in the ADC, I can create a ReceptorReactivity object with the following fields, pulled from IEDB (https://www.iedb.org/assay/21965299).

This Receptor has 9 assays associated with it that returned activity, so presumably if I wanted to capture all of the reactivity for this receptor, I would have 9 different ReceptorReactivity objects. For example one of the other assays (https://www.iedb.org/assay/21965295) is an Elisa assay so I would have something like the same as above with the exception of: