Open bcorrie opened 2 weeks ago
This is how I extended the Assay
class to a TCellReceptorEpitopeBindingAssay
class with additional slots: https://github.com/airr-knowledge/ak-schema/pull/10/files#diff-29889c07d8b30b52c576167bc97eb45e26fd0f41a4e31b474fcf882bbe66953aR33
Maybe Assay
isn't the correct terminology to use (and therefore isn't the correct class), but what we need to describe is protocol that takes a specimen and a sequencing process that, rather than capturing how a single number for a TCR/Epitope interaction is derived, it describes how N sequences where generated (where N can be on the order of millions).
So I think this is a very different thing than what we have now with Assay. I am not clear if the right thing to do is generalize Assay or create a different entity entirely. 8-)
Ok. Then we might need @bpeters42 to chime in. I think he's on vacation this week.
Maybe
Assay
isn't the correct terminology to use (and therefore isn't the correct class), but what we need to describe is protocol that takes a specimen and a sequencing process that, rather than capturing how a single number for a TCR/Epitope interaction is derived, it describes how N sequences where generated (where N can be on the order of millions).
Conceptually, I think of an Assay
as the specific step that translates a biological material into an information entity (or set of entities). For AIRR, this is the sequencing step, or SequencingRun
object. I think James' approach of extending the Assay
class to (say) AIRRSequencingAssay
is the right approach, and we add additional slots specific to it. I expect this class to be very similar to the AIRR SequencingRun
object.
A primary difference is that the values aren't directly stored. Instead, they are stored by reference. Mostly preferably, references to SRA.
To be fully complete, similar to SpecimenProcessing
, we need to add data processing steps that start from the "output" of the AIRRSequencingAssay
and eventually generates the data that we recognize as the Chain
objects.
'OBI:assay' is exactly meant for this; the output of an assay is a data item, which in this case would be a list of receptor sequences. That is essentially what 'OBI:T cell receptor repertoire sequencing assay' is. In what James had been working on, we had focused on IEDB data where we have specific epitopes with each assay. But obviously for AKC we want to generalize this for non epitope specific assays.
Doing this on the fly, but essentially the only thing changing that the type of output from a sequencing assay is not a single value / unit pair, but rather a data item that includes sequences...
Created branch assay-refactor to work on this...
https://github.com/airr-knowledge/ak-schema/tree/assay-refactor
Related OBI ontology IDs:
T-cell: https://ontobee.org/ontology/OBI?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FOBI_0002990 B-cell: https://ontobee.org/ontology/OBI?iri=http://purl.obolibrary.org/obo/OBI_0002991
Currently the
Assay
class has a single measurement, a value and a unit.This makes sense for IEDB's style of Assay, but it is very unclear to me what an
Assay
is in the AIRR-seq world. Definitely not my area of expertise, but I don't really know when to create anAssay
object when transforming ADC data into AKC data.