Closed theferrit32 closed 1 year ago
Based on discussion today we will wrap the copy number variants in canonical variations. We will need to either change our calls to the normalization service in order to generate those canonical variation ids for us, or use the clojure implementation of the vrs hash algorithm to generate the canonical variation ids locally
@theferrit32 can you start a thread on slack with Alex W. Kori K and myself to see if they'd be willing to modify the "CanonicalizeVariant" API in the normalizer rest service to accommodate CNVs as CanonicalVariations? If they won't or if they delay, then we should leave the ClinVar CNV interps as you have them with no "CanonicalVariation" subjects and only "Absolute/RelativeCopyNumber" contextual variation subjects (and their associated VariationDescriptors. We can change this later, I don't want to delay the production of the snapshot files for this.
@theferrit32 you can archive this and we can restart a new ticket around "CanonicalVariation" wrappers with the new 2.0 work that is forthcoming. If you feel differently then please keep it and update it so that it is useful and current.
Based on discussion today with @larrybabb, I will just use clinvar id based identifiers for canonical variations at the moment because those objects will not need VRS digest-based identifiers.
I will still implement the CanonicalVariation wrapper for CopyNumberCount and CopyNumberChange variants, and we can decide about when to implement changes around merging in descriptor fields at a later point.
Just making up a placeholder id here. Example:
{
"id": "CanonicalVariation:clinvar:1003317",
"type": "CanonicalVariation",
"canonical_context": {
"id": "ga4gh:CX.dWWOhMANazJvqI0IOJT8sTyqwHSOLSxp",
"type": "CopyNumberChange",
"subject": {
"id": "ga4gh:SL.1oZDW5Wiy_DRAmEHVlCgBQ7ywbhOioLi",
"type": "SequenceLocation",
"sequence_id": "ga4gh:SQ.IW78mgV5Cqf6M24hy52hPjyyo5tCCd86",
"start": {
"type": "IndefiniteRange",
"value": 116380899,
"comparator": "<="
},
"end": {
"type": "IndefiniteRange",
"value": 116381085,
"comparator": ">="
}
},
"copy_change": "efo:0030067"
}
}
Right now the structure for RelativeCopyNumber and AbsoluteCopyNumber is:
(replacing AbsoluteCopyNumber with RelativeCopyNumber when no copy counts are provided)
This is not exactly compliant with the vrsatile schema because canonical_variation can just be a CURIE but not embed a variation object that is not a CanonicalVariation. https://github.com/ga4gh/vrsatile/blob/657b21e54fe422aa0d3b6301f5ddd7c42319c9d0/schema/vod-source.yaml#L282-L286
I'm not clear on what the decision was about whether we will proceed keeping this as is, or change the wrapper type to CategoricalVariationDescriptor (okay, just complicates our code), or just VariationDescriptor (would need a schema change in vrsatile), or doing something else.
We could also insert a CanonicalVariation in the descriptor that wraps the copy number variant, which would match other variant types in CanonicalVariationDescriptors better. Example:
Extended example: