Receptor object issues when used in real life...

bcorrie commented 11 months ago

In trying to load epitope specificity, we have found a few problems. In discussions with @bussec we have come up with a set of suggested changes. Please discuss 8-)

bcorrie commented 11 months ago

Following suggestions for discussion:

Made receptor_id unique.
Added ReceptorReactivity.receptor_reactivity_id - this is an object, it needs an ID
Added peptide_aa_string - useful to have, very difficult to compute with UNIPROT URI and peptide_start and peptide_end
Added new enum for reactivity_method - MHC-peptide multimer
Added new enum for reactivity_readout - barcode count

bcorrie commented 11 months ago

Made Receptor.reactivity_measurements an array of IDs not an array of objects
Added Cell.reactivity_measurements as an array of IDs so we can track which Cell a measurement was associated with

ReceptorReactivity needs to be associated with both Cells and Receptors. The actual measurement is an observation from an experiment that links a specific Cell to a specific Epitope (at least in the case of a 10X study). That Cell can be associated with a Receptor. In that case, the ReceptorReactivity is also evidence of an association between the more global Receptor so the ReceptorReactivity should be associated with the Receptor as well.

In our current standard, there is currently no way to link a ReceptorActivity observation with a specific Cell. Since Receptors might have many ReceptorReactivity values, and they may come from many Cells from many experiments, it is currently not possible to determine which Cell a ReceptorReactivity came from.

Hence the addition of Cell.reactivity_measurements

bcorrie commented 11 months ago

@bussec @kira-neller does that capture our discussion?

kira-neller commented 11 months ago

@bcorrie Yes, this captures the necessary changes as far as I understand. Thank you!

bussec commented 11 months ago

@bcorrie Yes, it captures our discussion.

However, I came across an additional complication that we to think about: If we reference to ReceptorReactivity records by their ID only (i.e., they are not nested into the Receptor object), then you don't know which receptor the reactivity measurement refers to. From a Cell record you could reconstruct this using both the receptors and the reactivity_measurements properties (which is already a bit of a pain), but for other potential references you would need to search all Receptor records for a matching reactivity measurement ID. Therefore ReceptorReactity needs to contain the receptor_hash.

Ruminating about this, there is the also the situation that we discussed in which a cell expresses more than one receptor. If in such a case you have data from an multimer-MHC binding assay, you won't be able to know which receptor mediated the binding. Therefore the ReceptorReactivity record would need to refer to multiple receptor_hash IDs. Which is not a problem by itself, but we need to clearly document that in such a case the respective receptor might have been involved, but you cannot be certain about it.

bcorrie commented 9 months ago

However, I came across an additional complication that we to think about: If we reference to ReceptorReactivity records by their ID only (i.e., they are not nested into the Receptor object), then you don't know which receptor the reactivity measurement refers to

I am not sure this is true is it... The Receptor object has a list of ReceptorReactivity IDs in it in the reactivity_measurements array. So you can find all ReceptorActivity entities for a Receptor by looking at Receptor.reactivity_measurements and you can find which Receptor a ReceptorReactivity entity refers to by searching all Receptor objects for the recep[tor_activity_id in the Receptor.reactivity_measurements. So you can find the receptor with a relatively expensive query...

This is actually fairly cumbersome, and would be more elegant I think if the ReceptorReactivity object pointed directly to the Receptor object. We discussed this and for some reason decided that an array of Receptor.reactivity_measurements was better. I can't remember why and I am not sure that was the right choice... 8-)

bcorrie commented 9 months ago

Isn't it always true that a single ReceptorReactivity instance comes from one, and only one Cell (the measurement for reactivity comes from a single cell, no?) and in your above scenario might point to a very small number of Receptors (e.g. the Cell expresses more than one Receptor and you don't know which Receptor is causing the reactivity)?

Maybe we should have ReceptorReactivity having a cell_id field and an array of receptor_id field (where the array would have small N 1-2?). We could then get rid of the reactivity_measurement fields and if you wanted to find all of the ReactivityMeasurement fields associated with a Cell or Receptor, you search ReceptorReactity objects for the cell_id or receptor_id of interest.

bcorrie commented 7 months ago

@bussec any comments on this. Would be good to close this off and merge with master.

bcorrie commented 5 months ago

Adding the other use case as per recent discussions in #705 for ReceptorReactivity

In my single-cell study stored in the ADC, I have found the following:

I have a Cell with IGHV1-46*01, STVVGAL, IGHJ4*02 and IGKV3-20*01, QQYGSSPLT, IGKJ4*01
I find that this Receptor has a known epitope binding in IEDB: https://www.iedb.org/receptor/193713
I want to capture this specificity information associated with my study in the ADC, so I create a Receptor object to this effect.

So I get something like this:

receptor_id: internal ID
receptor_hash: hash string
receptor_type: Ig
receptor_variable_domain_1_locus: IGH
receptor_variable_domain_1_aa: QVQLVQSGAEAKKPGASVKVSCKASGYSFTSYHMHWVRQAPGQGLEWMGIINPNGGTTTYAPKFQGRVTMTSDTSTSTVYMELTSLRSEDTAVYYCSTVVGALWGQGTLVIVSS
receptor_variable_domain_2_locus: IGK
receptor_variable_domain_2_aa: V Domain: EIVLTQSPGTLSLSPGERATLSCRASQSVRSNNYLAWYQQKPGQAPRLLIYGASTRATGIPDRFSGSGSGTDFTLTISRLEPEDFAVYYCQQYGSSPLTFGGGTRVEIK
receptor_ref: ["IEDB_RECEPTOR:193713"]

Since ReceptorReactivity information is in IEDB already, it may not be necessary to store this in the ADC with a ReceptorReactivity record, as from the above link, one can get this information from IEDB. Presumably this is what the AKC project will address and make easy for the user. Currently the user has to jump back and forth between the ADC and IEDB link this information. The iReceptor Gateway already does this automagically for CDR3 searches if the CDR3 is known on IEDB.

If I want to track the known ReceptorReactivity of the Receptor in the ADC, I can create a ReceptorReactivity object with the following fields, pulled from IEDB (https://www.iedb.org/assay/21965299).

receptor_reactivity_id : internal ID
receptor_hash: hash string
study_id: null (receptor activity not measured in study in the ADC)
ligand_type - peptide
antigen_type - peptide
antigen_source_species: Severe acute respiratory syndrome coronavirus 2 Wuhan/Hu-1/2019 (Note: No NCBITaxon entry for this strain).
antigen - GenPept:YP_009724390.1
peptide_sequence_aa: QPELDSFKEELDKYFKNHTSP
peptide_sequence_start: 1142
peptide_sequence_end: 1162
reactivity_method: biological_activity
reactivity_readout: neutrilization
reactivity_value: 1
reactivity_unit: boolean

This Receptor has 9 assays associated with it that returned activity, so presumably if I wanted to capture all of the reactivity for this receptor, I would have 9 different ReceptorReactivity objects. For example one of the other assays (https://www.iedb.org/assay/21965295) is an Elisa assay so I would have something like the same as above with the exception of:

reactivity_method: Elisa
reactivity_readout: qualitative binding
reactivity_value: Positive-Intermediate
reactivity_unit:

airr-community / airr-standards

Receptor object issues when used in real life... #704