ClinGen / gene-and-variant-curation-tools

ClinGen's gene and variant curation interfaces (GCI & VCI). Developed by Stanford ClinGen team.
https://curation.clinicalgenome.org/
MIT License
3 stars 1 forks source link

Support Allele Registry CACN IDs in gene and variant curation #32

Open wrightmw opened 2 years ago

wrightmw commented 2 years ago

CAR has implemented new CACN IDs that represent CNV variants. We need to update the GCI/VCI so that it can accept these new IDs. Example of new CACN ID in the CAR: https://reg.clinicalgenome.org/allele/CACN1537613926?overlappingGenes=true

https://broadinstitute.atlassian.net/browse/CSP-70

In the VCI:

1) When adding a new variant record. Currently, the choice is ClinVar ID and or CA ID... this will need to be expanded to include CACN ID as an option.

Note: if a CACN ID is added then a disclaimer should appears that says "Please note: The VCI is not optimized for the curation of CNVs"

2) Variant title - needs to be updated so that title can be based on new CACN title (for variant title use "communityStandardTitle" from CAR).

3) Need to consider knock-on effect to how data is retrieved or displayed in other parts of the VCI. e.g. what to show for functionalities that require a genomic location.

In the GCI:

3) Everywhere that a variant is currently added, then CACN ID need to be additional options.

For example, places where variant is saved:

Screen Shot 2019-11-18 at 4 56 29 PM

Screen Shot 2019-11-18 at 4 56 54 PM

Screen Shot 2019-11-18 at 4 58 01 PM

Screen Shot 2019-11-18 at 4 58 24 PM

Screen Shot 2019-11-18 at 4 58 50 PM

Screen Shot 2019-11-18 at 4 59 14 PM

4) Variant title - for displays. Same as the VCI, e.g. "communityStandardTitle" from CAR

For example...places where variant title is currently displayed: Screen Shot 2019-11-18 at 5 01 10 PM

Screen Shot 2019-11-18 at 5 01 19 PM

Screen Shot 2019-11-18 at 5 00 43 PM

Screen Shot 2019-11-18 at 5 00 32 PM

wrightmw commented 2 years ago

Add warning message when someone selects to add a CACN ID in the GCI: Note: in general you should be scoring copy number variants involving only one gene, and should not be scoring multi-gene copy number variants due to the fact that the contributions of the other genes cannot be ruled out. If you have any questions please discuss with your expert panel.

When a CACN is entered if it's already in ClinVar then add the ClinVar VariationID instead.

wrightmw commented 2 years ago

This ticket replaced https://github.com/ClinGen/clincoded/issues/2078

wrightmw commented 2 years ago

Dependency on Allele Registry to provide:

wrightmw commented 1 year ago

Example: https://reg.clinicalgenome.org/allele/CACN1537613926?overlappingGenes=true

[ { "@id": "https://reg.genome.network/allele/CACN1537613926", "CACNid": "CACN1537613926", "communityStandardTitle": "GRCh38 1p36.31(chr1:5554106-5969826)x4", "genomicAlleles": [ { "chromosome": "1", "coordinates": { "copies": 4, "cytoband": "1p36.31", "end": 5969826, "start": 5554106 }, "descriptor": "GRCh38 (chr1:5554106-5969826)x4", "referenceGenome": "GRCh38" } ], "overlappingGenes": [ { "@id": "https://rest.genenames.org/fetch/symbol/MIR4689", "symbol": "MIR4689" }, { "@id": "https://rest.genenames.org/fetch/symbol/NPHP4", "symbol": "NPHP4" } ] } ]

wrightmw commented 1 year ago

Structure now has name, and overlapping genes... just missing ClinVar ID.

wrightmw commented 1 year ago

When someone goes to add a variant, in all the locations described above.... then the pop-up modal "Select Variant" should say the following underneath when data is retrieved for a CACN ID:

Below are the data from the ClinGen Allele Registry for the CACN ID you submitted. Select "Save" below if it is the correct variant, otherwise revise your search above:

communityStandardTitle CACN ID (links to CAR) ClinVar Variation ID (blank if not present) Cytoband Overlapping Genes

For reference, here is a CA ID output... below.... note... for CACN IDs...

  1. the text is changed in each location from CA ID to CACN ID
  2. the title comes from same source and is still included (e.g. communityStandardTitle)
  3. HGVS Terms are replaced by overlapping genes list is added

Screenshot 2023-02-08 at 8 08 30 PM

So, for the example used previously: https://reg.clinicalgenome.org/allele/CACN1537613926?overlappingGenes=true

It would be: Below are the data from the ClinGen Allele Registry for the CACN ID you submitted. Select "Save" below if it is the correct variant, otherwise revise your search above:

GRCh38 1p36.31(chr1:5554106-5969826)x4 CACN ID: CACN1537613926 Overlapping Genes: MIR4689, NPHP4

wrightmw commented 1 year ago

Note: in general you should be scoring copy number variants involving only one gene, and should not be scoring multi-gene copy number variants due to the fact that the contributions of the other genes cannot be ruled out. If you have any questions please discuss with your expert panel.

wrightmw commented 1 year ago

1) Replace "View variant evidence in Variant Curation Interface" link with: Note: in general you should be scoring copy number variants involving only one gene, and should not be scoring multi-gene copy number variants due to the fact that the contributions of the other genes cannot be ruled out. If you have any questions please discuss with your expert panel. 2) Add the same text above to the modal, under the "Overlapping Genes" section. 3) Count the number of genes Overlapping Genes returned, and create a new field called "Number of Overlapping Genes" and situate this above where the "View variant evidence in Variant Curation Interface" link was (above the new text, below the title). 4) In the Evidence summaries, put the number of overlapping genes in parentheses after the title. @markmandell

markmandell commented 1 year ago

Pasting email sent to Tristan, Phil, and Terry regarding publish messages sent to website (continuing conversation in this ticket):

We are currently working on supporting the Allele Registry CACN IDs in the GCI. I wanted to reach out to you all to confirm how the CACN IDs/Copy Number Variants should be included in the GCI Publish messages. Attached is a full publish message from my dev instance where some CACN IDs are added as Associated Variants to a Gene Disease Record (also added a variant via ClinVar ID to compare the variants).

The additional CNV fields are cacnId and overlappingGenes, and the value for variantType will be “copy number gain”. The overlappingGenes array can be quite large, in this example there are over 100 genes listed for a given variant. My understanding is that the users only want to know how many overlapping genes there are for CNVs and would not need the full list of genes as it is being sent over now. I wanted to propose sending a field with just the count named numberOverlappingGenes or overlappingGenesCount, or appending the count to the variant title to minimize the amount of work needed on your end, but let me know what you think and we can go from there.

Resource ID for dev message: 439fbd7a-0762-4b2e-b8f2-7524c616ee1a Offset: 154

tnavatar commented 1 year ago

From @markmandell via email:

Hi all,

We are currently working on supporting the Allele Registry CACN IDs in the GCI. I wanted to reach out to you all to confirm how the CACN IDs/Copy Number Variants should be included in the GCI Publish messages. Attached is a full publish message from my dev instance where some CACN IDs are added as Associated Variants to a Gene Disease Record (also added a variant via ClinVar ID to compare the variants).

The additional CNV fields are cacnId and overlappingGenes, and the value for variantType will be “copy number gain”. The overlappingGenes array can be quite large, in this example there are over 100 genes listed for a given variant. My understanding is that the users only want to know how many overlapping genes there are for CNVs and would not need the full list of genes as it is being sent over now. I wanted to propose sending a field with just the count named numberOverlappingGenes or overlappingGenesCount, or appending the count to the variant title to minimize the amount of work needed on your end, but let me know what you think and we can go from there.

Happy to discuss via Slack or we can continue through this email thread. Thanks and let me know if you have any questions or if additional information is needed.

Link to the ticket: https://github.com/ClinGen/gene-and-variant-curation-tools/issues/32

Best, Mark Mandell

Few questions:

  1. I see 'copy number gain' as a variation type. Would 'copy number loss' also be valid? There is a parallel discussion on this going on in GA4GH VRS about how to describe copy number variation: (https://github.com/ga4gh/vrs/issues/404#issuecomment-1419363538). As long as the valid options are in the range of the EFO or SO terms listed there, we're good.
  2. Overlapping genes are represented by their symbols; good for readability, less good for computability (though I see that's the form they're coming over in from the Allele Registry). I'm assuming it would be a bother to use HGNC, Ensemble, or NCBI Entrez IDs?
  3. There's no structured description of the CNV, I think (affected coordinates, variable breakpoints, copy number). I am assuming we would have to go to either the Allele Registry or ClinVar for this info (or parse the provided expression)?
markmandell commented 1 year ago

Hi @tnavatar,

  1. Going to review the discussion you linked and check for variationType of "copy number loss" as you mentioned. Will also check in with Matt to be sure.

  2. For the overlapping genes list, there is a reference link to XML for each gene provided by the Allele registry that I have omitted from being saved to our DB when a user adds the CNV. Right now, we are only saving the list of symbols for display of symbols and gene count in the UI. However, I can see if parsing out the hgnc_id from the XML for each gene would be feasible, if that is something you are interested in. Although could be a fairly big list for some variants (some over 100, but not sure if thats a concern on your end). So could be:

    overlappingGenes: [
    {
    "geneSymbol": "ACTG1P12",
    "hgnc_id: "HGNC:44496"
    },
    ....
    ]
  3. The data in genomicAlleles can also be parsed and sent over as well, similar to what we do for CA IDs for hgvs names. Let me know if what I included below is what you are mentioning. Can either parse out the relevant info or include as is.

    "genomicAlleles": [
      {
        "chromosome": "3",
        "coordinates": {
          "copies": 3,
          "cytoband": "3p26.3-3p24.3",
          "end": 19510600,
          "start": 63843
        },
        "descriptor": "GRCh38 (chr3:63843-19510600)x3",
        "referenceGenome": "GRCh38"
      }
    ],

Here is a link to the example I am using in this comment for reference https://reg.clinicalgenome.org/allele/CACN1538018649?overlappingGenes=true

wrightmw commented 1 year ago

@tnavatar For us CACN IDs are like CA IDs, in that we primarily need them to provide a unique identifier for the variant object within the VCI/GCI... .we also need a title for displaying the variant...and the genomic coordinates/HGVS are also used for bringing in associated evidence in the VCI. When a curators goes to add in a variant, we also bring in some associated data on-the-fly that allows the user to make sure they have added the correct variant (e.g. alternate HGVS titles in the VCI). For CACN IDs in the GCI, this added curator help includes how many genes overlap the region. This is important because curators should only use a CACN ID if there is only ONE gene in the repeat region. If there is more than one then they should not be adding it. So, I think it is unlikely you will get a long string of genes published to the website with a CACN ID.

HGNC IDs We get the list of overlapping genes from CACN ID JSON output from the CAR API on-the-fly, and this does not currently include the HGNC IDs (just the gene symbols)...if you need them then there are probably 2 routes for this, 1) we ask Baylor to include them in the CACN JSON output, then we add re-code our GCI outputs, and you work with us to receive these new fields, or 2) you use the CACN IDs to get this data directly from the CAR. Or perhaps, on the website you only need to list the number of overlapping genes? For reference, in the evidence summary in the GCI users said they only wanted to see the number of overlapping genes, not the whole list of what the genes are... maybe the request is the same for the website? And we have the number of overlapping genes and so could just provide that to you... and it could be added as a separate field, or just appended in parentheses to the end of the variant title? So, you may not need to list the genes at all?

CACN type With respect to whether the the CACN is a "copy number loss" or "copy number gain". I think the only way to know this is if someone looks at the copy number and compares it to the reference genome? It is listed in ClinVar entries but not in the CAR generally for CACN IDs, so this is not information we would have for every CACN ID.

@ErinRiggs and @courtneythaxton - it would be good to know which of the fields being suggested is required or not (e.g. the HGNC ID for the overlapping gene, and whether it is an loss/gain type CACN)....i.e. it would be good to get clarification on whether these would be useful/necessary for the users to see on the website and whether you feel we should hold off on the release of this feature until we have these fields.

tnavatar commented 1 year ago

@wrightmw Gotcha--so the list of genes should be less relevant for the purpose of receiving data from the GCI; there should only be one and it should match the gene being curated.

Regarding copy number type, I figure you can infer that for autosomal variants, but when talking about variants on sex chromosomes it's nice to have an explicit descriptor. I don't know that it would matter for showing on the website in its current form, but it would be good to think about getting that for future uses. Seems like something that should be added to the CAR.

markmandell commented 1 year ago

Hey @tnavatar, given the above conversation, I was wondering if the current output we are sending is sufficient for now until we discuss adding scope for future use. So would be 2 additional fields being added for CNVs in GCI: overlappingGenes and cacnId. Let me know what you think and we can coordinate from there. Thanks!

courtneythaxton commented 1 year ago

Hi All,

Thanks for starting this discussion thread Matt.

I agree with what has been mentioned thus far and just wanted to add a few comments.

  1. It is true that a CACN with multiple genes reflects a multigene disorder which does not fit the current monogenic framework, however it may be useful to publish the case with a score of 0, especially if the paper suggested it was a single gene, only for the mapping to indicate it was not.
    • In these cases I think having the # of overlapping genes would suffice for the report, but I would especially like if the genes are listed in the CAR, which I believe they will be. I think the website and GCI/VCI can then use linking to the CAR page for the region as a way a user can see those genes.
  2. The copy number might be good to see, I agree, but I think we may be able to put this aside for a future iteration to the website and/or GCI/VCI.

Thanks for all the hard work on this!

Courtney

-- Courtney Thaxton, Ph.D. (she/her/hers) Director, ClinGen-affiliated UNC Biocuration and Coordination Core Research Assistant Professor, Dept. of Genetics University of North Carolina at Chapel Hill 120 Mason Farm road 5100B Genetic Medicine Building CB#7264 Chapel Hill, NC 27599

From: tnavatar @.> Date: Wednesday, March 15, 2023 at 4:25 PM To: ClinGen/gene-and-variant-curation-tools @.> Cc: Thaxton, Courtney Lynn @.>, Mention @.> Subject: Re: [ClinGen/gene-and-variant-curation-tools] Support Allele Registry CACN IDs in gene and variant curation (#32)

@wrightmwhttps://github.com/wrightmw Gotcha--so the list of genes should be less relevant for the purpose of receiving data from the GCI; there should only be one and it should match the gene being curated.

Regarding copy number type, I figure you can infer that for autosomal variants, but when talking about variants on sex chromosomes it's nice to have an explicit descriptor. I don't know that it would matter for showing on the website in its current form, but it would be good to think about getting that for future uses. Seems like something that should be added to the CAR.

— Reply to this email directly, view it on GitHubhttps://github.com/ClinGen/gene-and-variant-curation-tools/issues/32#issuecomment-1470791098, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHYF4VMC2CJHOK6IGXLVKJTW4IQSLANCNFSM5DARRMBQ. You are receiving this because you were mentioned.Message ID: @.***>

tnavatar commented 1 year ago

It seems like the current output should be OK for display purposes; it sounds like desired changes in the future might be more related to the CAR anyway.

On Mar 17, 2023, at 12:33 PM, Mark Mandell @.***> wrote:

Hey @tnavatar https://github.com/tnavatar, given the above conversation, I was wondering if the current output we are sending is sufficient for now until we discuss adding scope for future use. So would be 2 additional fields being added for CNVs in GCI: overlappingGenes and cacnId. Let me know what you think and we can coordinate from there. Thanks!

— Reply to this email directly, view it on GitHub https://github.com/ClinGen/gene-and-variant-curation-tools/issues/32#issuecomment-1474100388, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC5MMGDAKDRHNENGMF6QVLW4SG47ANCNFSM5DARRMBQ. You are receiving this because you were mentioned.

markmandell commented 1 year ago

Hi @tnavatar, wanted to let you know that we released the support of CACN IDs in the GCI today. Let me know if you have any questions or need anything from us. Thanks!!