airr-knowledge / issues

Issues and project management for the AKC
0 stars 0 forks source link

Add chain/receptor to LinkML schema #59

Closed bcorrie closed 1 month ago

bcorrie commented 1 month ago

Work proceeding on https://github.com/airr-knowledge/ak-schema/tree/chain-receptor

bcorrie commented 1 month ago

@schristley is the correct file for these LinkML objects:

https://github.com/airr-knowledge/ak-schema/blob/chain-receptor/src/ak_schema/schema/ak_immune_system.yaml

bcorrie commented 1 month ago

@schristley is Chain.isotype the same as Chain.c_call? I think we only need one of these, no?

bcorrie commented 1 month ago

@schristley I have added the definition to ak_immune_system.yaml if you want to review.

Note it was trivial because I had already auto translated everything from the AIRR Spec here: https://github.com/airr-knowledge/ak-schema/blob/airr-export/src/ak_schema/schema/airr/ak_slot_Rearrangement.yaml

schristley commented 1 month ago

@schristley is Chain.isotype the same as Chain.c_call? I think we only need one of these, no?

yes just one, let's go with c_call as it is more precise. Sometimes the isotype is the "family" level of c_call, so biologists talk about IgA, IgM, etc., though there is actually a gene number and allele for them.

schristley commented 1 month ago

@bcorrie need reference to germline database, which I guess cannot really be done until the other PR is resolved.

bcorrie commented 1 month ago

yes just one, let's go with c_call as it is more precise.

Done...

bcorrie commented 1 month ago

@bcorrie need reference to germline database, which I guess cannot really be done until the other PR is resolved.

@schristley do we want that at the chain or receptor level? We don't have this in the ADC, since this is consistent across all chains that will have been processed the same way? This should be attached to a SampleProcessing or DataProcessing type of AKC class, no?

schristley commented 1 month ago

@bcorrie need reference to germline database, which I guess cannot really be done until the other PR is resolved.

@schristley do we want that at the chain or receptor level? We don't have this in the ADC, since this is consistent across all chains that will have been processed the same way? This should be attached to a SampleProcessing or DataProcessing type of AKC class, no?

Yeah, in AIRR, it is assumed that all the rearrangements in a sample are processed with the same germline so the DB link was moved to a "higher-level" in DataProcessing. It's unclear if we can do that with AKC. For now, from a data modeling perspective, wherever there are VDJ calls, we need the DB link to interpret those calls. But there is the complication that if a Chain comes from multiple ADC rearrangements, that have been processed with different DBs, then what then? I think our plan to avoid that complication is in the re-processing, we will standardize on a DB (i.e. AIRR germline DBs).