airr-knowledge / issues

Issues and project management for the AKC
0 stars 0 forks source link

How are we going to represent AIRR sample processing in the AKC #58

Open bcorrie opened 1 month ago

bcorrie commented 1 month ago

The AIRR Standard has a very rich set of metadata about how samples are processed for sequencing, including sample processing, cell processing, nucleic acid processing, and sequence data processing.

In the AKC we currently have the following objects that are related to this:

In my conversion of ADC data to AKC, there is nothing in the AKC data model for me to map the 40 odd ADC fields from the above objects.

The simple and straightforward approach would be to create the following AKC LinkML classes:

A DataSet in the AKC currently consists of a list of assessments or assays as a slots. If we added a processes slot, we could then use the AKC represent an ADC DataSet that has had a set of processes applied (as above).

This still leaves me not really understanding how Assay fits into describing and ADC data set.

@schristley thoughts? This would result in us having a mapping of a significant percentage of the ADC fields in the AKC. The relationships between the classes still needs to be resolved, but at least all of the data would be mapped.

I suppose the main question is, do we want/need all of the ADC Repertoire fields mapped to the AKC?

If we do create the above LinkML classes, then field -> field translation of the ADC to AKC is probably 80% - 90% complete. What would remain is the transformation of any fields that don't have a direct mapping.

bcorrie commented 1 month ago

I should note that a few weeks ago I added the above to the Consolidated Mapping sheet. With the YAML files I have generated this should be pretty easy to add, no?