clingen-data-model / genegraph

Presents an RDF triplestore of gene information using GraphQL APIs
5 stars 0 forks source link

Implement Absolute/Relative copy number CanonicalVariation wrapper #754

Closed theferrit32 closed 1 year ago

theferrit32 commented 1 year ago

Right now the structure for RelativeCopyNumber and AbsoluteCopyNumber is:

{"type": "CanonicalVariationDescriptor",
  "canonical_variation": {"type": "AbsoluteCopyNumber", …}}

(replacing AbsoluteCopyNumber with RelativeCopyNumber when no copy counts are provided)

This is not exactly compliant with the vrsatile schema because canonical_variation can just be a CURIE but not embed a variation object that is not a CanonicalVariation. https://github.com/ga4gh/vrsatile/blob/657b21e54fe422aa0d3b6301f5ddd7c42319c9d0/schema/vod-source.yaml#L282-L286

I'm not clear on what the decision was about whether we will proceed keeping this as is, or change the wrapper type to CategoricalVariationDescriptor (okay, just complicates our code), or just VariationDescriptor (would need a schema change in vrsatile), or doing something else.

We could also insert a CanonicalVariation in the descriptor that wraps the copy number variant, which would match other variant types in CanonicalVariationDescriptors better. Example:

{"type": "CanonicalVariationDescriptor",
  "canonical_variation": {
    {"type": "CanonicalVariation",
     "variation": {
     "type": "AbsoluteCopyNumber", … }}}

Extended example:

{
    "description": "GRCh38/hg38 Yq11.221-11.223(chrY:13908860-22358529)x1",
    "subject_variation_descriptor": [],
    "type": "CanonicalVariationDescriptor",
    "xrefs": [
        "https://www.ncbi.nlm.nih.gov/clinvar/147317",
        "https://identifiers.org/clinvar:147317"
    ],
    "alternate_labels": [],
    "canonical_variation": {
        "id": "ga4gh:VAC.LlAI1aCpvi_0GjAVTHsnRvvgzQKwvBD2",
        "type": "AbsoluteCopyNumber",
        "subject": {
            "id": "ga4gh:VSL.Cb-1d2cLoGMySF0HLw-jmcPjFnb9GZI0",
            "type": "SequenceLocation",
            "sequence_id": "ga4gh:SQ.8_liLu1aycC0tPQPFmUaGXJLDs5SbPZ5",
            "interval": {
                "type": "SequenceInterval",
                "start": {
                    "type": "IndefiniteRange",
                    "value": 13908859,
                    "comparator": "<="
                },
                "end": {
                    "type": "IndefiniteRange",
                    "value": 22358529,
                    "comparator": ">="
                }
            }
        },
        "copies": {
            "type": "Number",
            "value": 1
        }
    },
    "record_metadata": {
        "type": "RecordMetadata",
        "is_version_of": "http://dataexchange.clinicalgenome.org/terms/VariationDescriptor_147317",
        "version": "2019-07-01"
    },
    "extensions": [
        {
            "type": "Extension",
            "name": "variation_type",
            "value": "copy number gain"
        },
        {
            "type": "Extension",
            "name": "entity_type",
            "value": "variation"
        },
        {
            "type": "Extension",
            "name": "protein_change",
            "value": []
        },
        {
            "type": "Extension",
            "name": "clingen_version",
            "value": 0
        },
        {
            "type": "Extension",
            "name": "child_ids",
            "value": []
        },
        {
            "type": "Extension",
            "name": "allele_id",
            "value": "157068"
        },
        {
            "type": "Extension",
            "name": "subclass_type",
            "value": "SimpleAllele"
        },
        {
            "type": "Extension",
            "name": "clinvar_variation",
            "value": "https://identifiers.org/clinvar:147317"
        },
        {
            "type": "Extension",
            "name": "descendant_ids",
            "value": []
        },
        {
            "type": "Extension",
            "name": "canonical_expression",
            "value": {
                "genegraph.annotate.cnv/string": "GRCh38/hg38 Yq11.221-11.223(chrY:13908860-22358529)x1",
                "assembly": "GRCh38",
                "genegraph.annotate.cnv/reference": "hg38",
                "genegraph.annotate.cnv/cytogenetic-location": "Yq11.221-11.223",
                "chr": "Y",
                "start": 13908860,
                "end": 22358529,
                "total_copies": 1
            }
        },
        {
            "type": "Extension",
            "name": "candidate_expressions",
            "value": []
        }
    ],
    "label": "GRCh38/hg38 Yq11.221-11.223(chrY:13908860-22358529)x1",
    "id": "http://dataexchange.clinicalgenome.org/terms/VariationDescriptor_147317.2019-07-01",
    "members": [
        {
            "type": "VariationMember",
            "expressions": [
                {
                    "type": "Expression",
                    "syntax": "hgvs.g",
                    "value": "NC_000024.8:g.(?_14530134)_(22914064_?)dup"
                }
            ]
        },
        {
            "type": "VariationMember",
            "expressions": [
                {
                    "type": "Expression",
                    "syntax": "hgvs.g",
                    "value": "NC_000024.10:g.(?_13908860)_(22358529_?)dup"
                }
            ]
        },
        {
            "type": "VariationMember",
            "expressions": [
                {
                    "type": "Expression",
                    "syntax": "hgvs.g",
                    "value": "NC_000024.9:g.(?_16020740)_(24504676_?)dup"
                }
            ]
        }
    ]
}
theferrit32 commented 1 year ago

Based on discussion today we will wrap the copy number variants in canonical variations. We will need to either change our calls to the normalization service in order to generate those canonical variation ids for us, or use the clojure implementation of the vrs hash algorithm to generate the canonical variation ids locally

larrybabb commented 1 year ago

@theferrit32 can you start a thread on slack with Alex W. Kori K and myself to see if they'd be willing to modify the "CanonicalizeVariant" API in the normalizer rest service to accommodate CNVs as CanonicalVariations? If they won't or if they delay, then we should leave the ClinVar CNV interps as you have them with no "CanonicalVariation" subjects and only "Absolute/RelativeCopyNumber" contextual variation subjects (and their associated VariationDescriptors. We can change this later, I don't want to delay the production of the snapshot files for this.

larrybabb commented 1 year ago

@theferrit32 you can archive this and we can restart a new ticket around "CanonicalVariation" wrappers with the new 2.0 work that is forthcoming. If you feel differently then please keep it and update it so that it is useful and current.

theferrit32 commented 1 year ago

Based on discussion today with @larrybabb, I will just use clinvar id based identifiers for canonical variations at the moment because those objects will not need VRS digest-based identifiers.

I will still implement the CanonicalVariation wrapper for CopyNumberCount and CopyNumberChange variants, and we can decide about when to implement changes around merging in descriptor fields at a later point.

theferrit32 commented 1 year ago

Just making up a placeholder id here. Example:

{
    "id": "CanonicalVariation:clinvar:1003317",
    "type": "CanonicalVariation",
    "canonical_context": {
        "id": "ga4gh:CX.dWWOhMANazJvqI0IOJT8sTyqwHSOLSxp",
        "type": "CopyNumberChange",
        "subject": {
            "id": "ga4gh:SL.1oZDW5Wiy_DRAmEHVlCgBQ7ywbhOioLi",
            "type": "SequenceLocation",
            "sequence_id": "ga4gh:SQ.IW78mgV5Cqf6M24hy52hPjyyo5tCCd86",
            "start": {
                "type": "IndefiniteRange",
                "value": 116380899,
                "comparator": "<="
            },
            "end": {
                "type": "IndefiniteRange",
                "value": 116381085,
                "comparator": ">="
            }
        },
        "copy_change": "efo:0030067"
    }
}