airr-knowledge / issues

Issues and project management for the AKC
0 stars 0 forks source link

How are we going to represent AIRR sample processing in the AKC #58

Open bcorrie opened 3 months ago

bcorrie commented 3 months ago

The AIRR Standard has a very rich set of metadata about how samples are processed for sequencing, including sample processing, cell processing, nucleic acid processing, and sequence data processing.

In the AKC we currently have the following objects that are related to this:

In my conversion of ADC data to AKC, there is nothing in the AKC data model for me to map the 40 odd ADC fields from the above objects.

The simple and straightforward approach would be to create the following AKC LinkML classes:

A DataSet in the AKC currently consists of a list of assessments or assays as a slots. If we added a processes slot, we could then use the AKC represent an ADC DataSet that has had a set of processes applied (as above).

This still leaves me not really understanding how Assay fits into describing and ADC data set.

@schristley thoughts? This would result in us having a mapping of a significant percentage of the ADC fields in the AKC. The relationships between the classes still needs to be resolved, but at least all of the data would be mapped.

I suppose the main question is, do we want/need all of the ADC Repertoire fields mapped to the AKC?

If we do create the above LinkML classes, then field -> field translation of the ADC to AKC is probably 80% - 90% complete. What would remain is the transformation of any fields that don't have a direct mapping.

bcorrie commented 3 months ago

I should note that a few weeks ago I added the above to the Consolidated Mapping sheet. With the YAML files I have generated this should be pretty easy to add, no?

bcorrie commented 3 weeks ago

Currently no sample and cell processing is represented in the AKC and we have no field mappings from the ADC to any objects in the AKC. For example, there is no way currently to confirm which BCR/TCR loci are associated with a set of sequences not whether a specific set of sequences are from a specific set of cell types (through cell sorting).

At the same time, I don't believe any objects in the AKC (as based on the IEDB model) such as Assay have appropriate fields for this information.

@jamesaoverton is there anything in the ImmuneSpace data model that captures this? I have to assume that there is a mechanism in ImmuneSpace to capture how samples and cells are processed in preparation for sequencing???

jamesaoverton commented 2 weeks ago

You're right that this is missing from the current schema. We discussed it a few times, and made some notes on the Miro board, but didn't implement it.

Here's a first pass for discussion: https://github.com/airr-knowledge/ak-schema/pull/12