The AIRR Standard has a very rich set of metadata about how samples are processed for sequencing, including sample processing, cell processing, nucleic acid processing, and sequence data processing.
In the AKC we currently have the following objects that are related to this:
Dataset - maybe a representation of the final set of annotated sequences
Assay - I think this might fit in in the spectrum somewhere, but its current definition is very narrow. It essentially has a single value, clearly designed to measure an Assay for reactivity/specificity between a receptor and antigen/epitope. As it is I don't think it makes sense for BCR/TCR sequencing???
PlannedProcess - an abstract class that seems to capture the AIRR concepts above
SpecimenCollection - a subclass of PlannedProcess that describes how specimens are collected, but currently it doesn't have any process slots that describe how the specimen was collected, it only has a pointer to a specimen.
In my conversion of ADC data to AKC, there is nothing in the AKC data model for me to map the 40 odd ADC fields from the above objects.
The simple and straightforward approach would be to create the following AKC LinkML classes:
A DataSet in the AKC currently consists of a list of assessments or assays as a slots. If we added a processes slot, we could then use the AKC represent an ADC DataSet that has had a set of processes applied (as above).
This still leaves me not really understanding how Assay fits into describing and ADC data set.
@schristley thoughts? This would result in us having a mapping of a significant percentage of the ADC fields in the AKC. The relationships between the classes still needs to be resolved, but at least all of the data would be mapped.
I suppose the main question is, do we want/need all of the ADC Repertoire fields mapped to the AKC?
If we do create the above LinkML classes, then field -> field translation of the ADC to AKC is probably 80% - 90% complete. What would remain is the transformation of any fields that don't have a direct mapping.
I should note that a few weeks ago I added the above to the Consolidated Mapping sheet. With the YAML files I have generated this should be pretty easy to add, no?
The AIRR Standard has a very rich set of metadata about how samples are processed for sequencing, including sample processing, cell processing, nucleic acid processing, and sequence data processing.
In the AKC we currently have the following objects that are related to this:
Dataset
- maybe a representation of the final set of annotated sequencesAssay
- I think this might fit in in the spectrum somewhere, but its current definition is very narrow. It essentially has a single value, clearly designed to measure an Assay for reactivity/specificity between a receptor and antigen/epitope. As it is I don't think it makes sense for BCR/TCR sequencing???PlannedProcess
- an abstract class that seems to capture the AIRR concepts aboveSpecimenCollection
- a subclass ofPlannedProcess
that describes how specimens are collected, but currently it doesn't have any process slots that describe how the specimen was collected, it only has a pointer to a specimen.In my conversion of ADC data to AKC, there is nothing in the AKC data model for me to map the 40 odd ADC fields from the above objects.
The simple and straightforward approach would be to create the following AKC LinkML classes:
CellProcessing
- subclass ofPlannedProcess
with all of the fields in the AIRR CellProcessing object: https://github.com/airr-knowledge/ak-schema/blob/airr-export/src/ak_schema/schema/airr/ak_slot_CellProcessing.yamlNucleicAcidProcessing
- subclass ofPlannedProcess
with all of the fields in the AIRR NucleicAcidProcessing object: https://github.com/airr-knowledge/ak-schema/blob/airr-export/src/ak_schema/schema/airr/ak_slot_NucleicAcidProcessing.yamlSequencingRun
- subclass ofPlannedProcess
with all of the fields in the AIRR SequencingRun object: https://github.com/airr-knowledge/ak-schema/blob/airr-export/src/ak_schema/schema/airr/ak_slot_SequencingRun.yamlSequencingData
- subclass ofPlannedProcess
with all of the fields in the AIRR SequencingData object: https://github.com/airr-knowledge/ak-schema/blob/airr-export/src/ak_schema/schema/airr/ak_slot_SequencingData.yamlDataProcessing
- subclass ofPlannedProcess
with all of the fields in the AIRR DataProcessing object: https://github.com/airr-knowledge/ak-schema/blob/airr-export/src/ak_schema/schema/airr/ak_slot_DataProcessing.yamlA
DataSet
in the AKC currently consists of a list ofassessments
orassays
as a slots. If we added aprocesses
slot, we could then use the AKC represent an ADCDataSet
that has had a set ofprocesses
applied (as above).This still leaves me not really understanding how
Assay
fits into describing and ADC data set.@schristley thoughts? This would result in us having a mapping of a significant percentage of the ADC fields in the AKC. The relationships between the classes still needs to be resolved, but at least all of the data would be mapped.
I suppose the main question is, do we want/need all of the ADC
Repertoire
fields mapped to the AKC?If we do create the above LinkML classes, then field -> field translation of the ADC to AKC is probably 80% - 90% complete. What would remain is the transformation of any fields that don't have a direct mapping.