airr-knowledge / issues

Issues and project management for the AKC
0 stars 0 forks source link

How do we extend Assay to incorporate AIRR-seq #64

Open bcorrie opened 2 weeks ago

bcorrie commented 2 weeks ago

Currently the Assay class has a single measurement, a value and a unit.

This makes sense for IEDB's style of Assay, but it is very unclear to me what an Assay is in the AIRR-seq world. Definitely not my area of expertise, but I don't really know when to create an Assay object when transforming ADC data into AKC data.

jamesaoverton commented 2 weeks ago

This is how I extended the Assay class to a TCellReceptorEpitopeBindingAssay class with additional slots: https://github.com/airr-knowledge/ak-schema/pull/10/files#diff-29889c07d8b30b52c576167bc97eb45e26fd0f41a4e31b474fcf882bbe66953aR33

bcorrie commented 2 weeks ago

Maybe Assay isn't the correct terminology to use (and therefore isn't the correct class), but what we need to describe is protocol that takes a specimen and a sequencing process that, rather than capturing how a single number for a TCR/Epitope interaction is derived, it describes how N sequences where generated (where N can be on the order of millions).

So I think this is a very different thing than what we have now with Assay. I am not clear if the right thing to do is generalize Assay or create a different entity entirely. 8-)

jamesaoverton commented 2 weeks ago

Ok. Then we might need @bpeters42 to chime in. I think he's on vacation this week.

schristley commented 2 weeks ago

Maybe Assay isn't the correct terminology to use (and therefore isn't the correct class), but what we need to describe is protocol that takes a specimen and a sequencing process that, rather than capturing how a single number for a TCR/Epitope interaction is derived, it describes how N sequences where generated (where N can be on the order of millions).

Conceptually, I think of an Assay as the specific step that translates a biological material into an information entity (or set of entities). For AIRR, this is the sequencing step, or SequencingRun object. I think James' approach of extending the Assay class to (say) AIRRSequencingAssay is the right approach, and we add additional slots specific to it. I expect this class to be very similar to the AIRR SequencingRun object.

A primary difference is that the values aren't directly stored. Instead, they are stored by reference. Mostly preferably, references to SRA.

schristley commented 2 weeks ago

To be fully complete, similar to SpecimenProcessing, we need to add data processing steps that start from the "output" of the AIRRSequencingAssay and eventually generates the data that we recognize as the Chain objects.

bpeters42 commented 2 weeks ago

'OBI:assay' is exactly meant for this; the output of an assay is a data item, which in this case would be a list of receptor sequences. That is essentially what 'OBI:T cell receptor repertoire sequencing assay' is. In what James had been working on, we had focused on IEDB data where we have specific epitopes with each assay. But obviously for AKC we want to generalize this for non epitope specific assays.

Doing this on the fly, but essentially the only thing changing that the type of output from a sequencing assay is not a single value / unit pair, but rather a data item that includes sequences...

bcorrie commented 1 week ago

Created branch assay-refactor to work on this...

https://github.com/airr-knowledge/ak-schema/tree/assay-refactor

bcorrie commented 1 week ago

Related OBI ontology IDs:

T-cell: https://ontobee.org/ontology/OBI?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FOBI_0002990 B-cell: https://ontobee.org/ontology/OBI?iri=http://purl.obolibrary.org/obo/OBI_0002991