clingen-data-model / clinvar-ingest-reports

ClinGen generates several google sheet based reports from the ClinVar ingested data that originates from the Broad BigQuery data.
0 stars 0 forks source link

Some VCEP gene symbols are invalid #20

Closed larrybabb closed 1 year ago

larrybabb commented 1 year ago

@dazzariti I was running some tests and discovered a handful of gene symbols in the Report Gene List table are invalid or old.

report_id   gene_symbol report_name active
RPT008  F13 Coagulation Factor Deficiency   false
RPT009  SEPN1   Congenital Myopathies   false
RPT021  HBE Hemoglobinopathy    false
RPT059  C1NH    Hereditary Angioedema   
RPT060  H3F3A   Histone H3

Using symbols may be problematic in the long run as these symbols can change from time to time. Ideally we would have each VCEP define the HGNC_ID or ENTREZ Gene ID that maps to the gene they are associated to. If they would do this then there would be less room for error and a lot less maintenance debt in managing these associations.

Here's what I think the correct symbols are for the above, but I think it would make sense to have the VCEP approve these. If they disagree they should offer a new valid symbol or retract the errant one.

F13 should be F13A1 SEPN1 should be SELENON HBE should be HBE1 C1NH should be SERPING1 H3F3A. should be H3-3A

dazzariti commented 1 year ago

Thank you - updated both documents that maintain the gene list. I confirmed the symbols against the VCEP application and gene curations when available.

https://docs.google.com/spreadsheets/d/1bADskBcobHTmmXungY09beWPDEa1nqM-PP__86yGVj0/edit?usp=sharing

https://docs.google.com/spreadsheets/d/1M98y9H5CfD2Mmlc3O0Hvwa-buUBcHyz3Nr8UUt5-LPI/edit?usp=sharing

As discussed on the B/G stand up - GPM does require HGNC IDs, but those are being collected prospectively, and need to be backfilled for older VCEPs (in progress).