Open bcorrie opened 3 months ago
I should note that a few weeks ago I added the above to the Consolidated Mapping sheet. With the YAML files I have generated this should be pretty easy to add, no?
Currently no sample and cell processing is represented in the AKC and we have no field mappings from the ADC to any objects in the AKC. For example, there is no way currently to confirm which BCR/TCR loci are associated with a set of sequences not whether a specific set of sequences are from a specific set of cell types (through cell sorting).
At the same time, I don't believe any objects in the AKC (as based on the IEDB model) such as Assay
have appropriate fields for this information.
@jamesaoverton is there anything in the ImmuneSpace data model that captures this? I have to assume that there is a mechanism in ImmuneSpace to capture how samples and cells are processed in preparation for sequencing???
You're right that this is missing from the current schema. We discussed it a few times, and made some notes on the Miro board, but didn't implement it.
Here's a first pass for discussion: https://github.com/airr-knowledge/ak-schema/pull/12
The AIRR Standard has a very rich set of metadata about how samples are processed for sequencing, including sample processing, cell processing, nucleic acid processing, and sequence data processing.
In the AKC we currently have the following objects that are related to this:
Dataset
- maybe a representation of the final set of annotated sequencesAssay
- I think this might fit in in the spectrum somewhere, but its current definition is very narrow. It essentially has a single value, clearly designed to measure an Assay for reactivity/specificity between a receptor and antigen/epitope. As it is I don't think it makes sense for BCR/TCR sequencing???PlannedProcess
- an abstract class that seems to capture the AIRR concepts aboveSpecimenCollection
- a subclass ofPlannedProcess
that describes how specimens are collected, but currently it doesn't have any process slots that describe how the specimen was collected, it only has a pointer to a specimen.In my conversion of ADC data to AKC, there is nothing in the AKC data model for me to map the 40 odd ADC fields from the above objects.
The simple and straightforward approach would be to create the following AKC LinkML classes:
CellProcessing
- subclass ofPlannedProcess
with all of the fields in the AIRR CellProcessing object: https://github.com/airr-knowledge/ak-schema/blob/airr-export/src/ak_schema/schema/airr/ak_slot_CellProcessing.yamlNucleicAcidProcessing
- subclass ofPlannedProcess
with all of the fields in the AIRR NucleicAcidProcessing object: https://github.com/airr-knowledge/ak-schema/blob/airr-export/src/ak_schema/schema/airr/ak_slot_NucleicAcidProcessing.yamlSequencingRun
- subclass ofPlannedProcess
with all of the fields in the AIRR SequencingRun object: https://github.com/airr-knowledge/ak-schema/blob/airr-export/src/ak_schema/schema/airr/ak_slot_SequencingRun.yamlSequencingData
- subclass ofPlannedProcess
with all of the fields in the AIRR SequencingData object: https://github.com/airr-knowledge/ak-schema/blob/airr-export/src/ak_schema/schema/airr/ak_slot_SequencingData.yamlDataProcessing
- subclass ofPlannedProcess
with all of the fields in the AIRR DataProcessing object: https://github.com/airr-knowledge/ak-schema/blob/airr-export/src/ak_schema/schema/airr/ak_slot_DataProcessing.yamlA
DataSet
in the AKC currently consists of a list ofassessments
orassays
as a slots. If we added aprocesses
slot, we could then use the AKC represent an ADCDataSet
that has had a set ofprocesses
applied (as above).This still leaves me not really understanding how
Assay
fits into describing and ADC data set.@schristley thoughts? This would result in us having a mapping of a significant percentage of the ADC fields in the AKC. The relationships between the classes still needs to be resolved, but at least all of the data would be mapped.
I suppose the main question is, do we want/need all of the ADC
Repertoire
fields mapped to the AKC?If we do create the above LinkML classes, then field -> field translation of the ADC to AKC is probably 80% - 90% complete. What would remain is the transformation of any fields that don't have a direct mapping.