arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
318 stars 120 forks source link

Feature request: Add ACMG info #755

Open davemcg opened 8 years ago

davemcg commented 8 years ago

Clinical centers are recommended to check ACMG's incidental finding list (http://www.ncbi.nlm.nih.gov/clinvar/docs/acmg/). It would be useful if Gemini facilitated this kind of query by either building a "Built-in analysis tool" for this or by adding a column to variants table indicating whether a gene was in the list (though only 56 currently have this designation).

I'm having some trouble figuring out what people are actually doing to parse their variants for this purpose, but I think doing the following query makes sense to filter for the genetic counselor.

Pseudo command:

SELECT * FROM variants WHERE gene=one_of_HGMD 
AND clinvar_sig LIKE '%pathogenic%' 
OR impact_severity='HIGH' 
AND maf_aaf_all < 0.05

Here's an actual functioning command which is trio-based, since that's how most of our exomes are setup:

family_id='the_family_id'
gemini query --header -q "SELECT chrom, start, gene, clinvar_sig, impact, impact_severity, max_aaf_all, (gts).(family_id=='$family_id') FROM variants WHERE \
    ( \
        gene='ACTA2' \
        OR gene='ACTC1'\
        OR gene='APC'\
        OR gene='APOB'\
        OR gene='BRCA1'\
        OR gene='BRCA2'\
        OR gene='CACNA1S'\
        OR gene='COL3A1'\
        OR gene='DSC2'\
        OR gene='DSG2'\
        OR gene='DSP'\
        OR gene='FBN1'\
        OR gene='GLA'\
        OR gene='KCNH2'\
        OR gene='KCNQ1'\
        OR gene='LDLR'\
        OR gene='LMNA'\
        OR gene='MEN1'\
        OR gene='MLH1'\
        OR gene='MSH2'\
        OR gene='MSH6'\
        OR gene='MUTYH'\
        OR gene='MYBPC3'\
        OR gene='MYH11'\
        OR gene='MYH7'\
        OR gene='MYL2'\
        OR gene='MYL3'\
        OR gene='MYLK'\
        OR gene='NF2'\
        OR gene='PCSK9'\
        OR gene='PKP2'\
        OR gene='PMS2'\
        OR gene='PRKAG2'\
        OR gene='PTEN'\
        OR gene='RB1'\
        OR gene='RET'\
        OR gene='RYR1'\
        OR gene='RYR2'\
        OR gene='SCN5A'\
        OR gene='SDHAF2'\
        OR gene='SDHB'\
        OR gene='SDHC'\
        OR gene='SDHD'\
        OR gene='SMAD3'\
        OR gene='STK11'\
        OR gene='TGFBR1'\
        OR gene='TGFBR2'\
        OR gene='TMEM43'\
        OR gene='TNNI3'\
        OR gene='TNNT2'\
        OR gene='TP53'\
        OR gene='TPM1'\
        OR gene='TSC1'\
        OR gene='TSC2'\
        OR gene='VHL'\
        OR gene='WT1'\
    ) \
    AND \
    (clinvar_sig LIKE '%pathogenic%' OR impact_severity='HIGH') \
    AND \
    max_aaf_all < 0.005" \
    --gt-filter "(gt_types).(family_id=='$family_id').(!=HOM_REF).(count>=1)" \
    gemini.db | | less -S

(In my database, I did do quick check with sort and uniq and my database (0.18) does recognize all the 56 genes.)

If you just want the ACMG list (parsed from http://www.ncbi.nlm.nih.gov/clinvar/docs/acmg/), here it is:

ACTA2
ACTC1
APC
APOB
BRCA1
BRCA2
CACNA1S
COL3A1
DSC2
DSG2
DSP
FBN1
GLA
KCNH2
KCNQ1
LDLR
LMNA
MEN1
MLH1
MSH2
MSH6
MUTYH
MYBPC3
MYH11
MYH7
MYL2
MYL3
MYLK
NF2
PCSK9
PKP2
PMS2
PRKAG2
PTEN
RB1
RET
RYR1
RYR2
SCN5A
SDHAF2
SDHB
SDHC
SDHD
SMAD3
STK11
TGFBR1
TGFBR2
TMEM43
TNNI3
TNNT2
TP53
TPM1
TSC1
TSC2
VHL
WT1
brentp commented 8 years ago

ok. I'll have a look. You can always use sql:

AND gene in ($genes)

where $genes can be created in python with ",".join("'%s'" % g for g in genes)

davemcg commented 8 years ago

Ah, that's cleaner

acmg_genes=\'ACTA2\',\'ACTC1\',\'APC\',\'APOB\',\'BRCA1\',\'BRCA2\',\'CACNA1S\',\'COL3A1\',\'DSC2\',\'DSG2\',\'DSP\',\'FBN1\',\'GLA\',\'KCNH2\',\'KCNQ1\',\'LDLR\',\'LMNA\',\'MEN1\',\'MLH1\',\'MSH2\',\'MSH6\',\'MUTYH\',\'MYBPC3\',\'MYH11\',\'MYH7\',\'MYL2\',\'MYL3\',\'MYLK\',\'NF2\',\'PCSK9\',\'PKP2\',\'PMS2\',\'PRKAG2\',\'PTEN\',\'RB1\',\'RET\',\'RYR1\',\'RYR2\',\'SCN5A\',\'SDHAF2\',\'SDHB\',\'SDHC\',\'SDHD\',\'SMAD3\',\'STK11\',\'TGFBR1\',\'TGFBR2\',\'TMEM43\',\'TNNI3\',\'TNNT2\',\'TP53\',\'TPM1\',\'TSC1\',\'TSC2\',\'VHL\',\'WT1\'
family_id='your_fam_id'
gemini query --header -q "SELECT chrom, start, gene, clinvar_sig, impact, impact_severity, max_aaf_all, (gts).(family_id=='$family_id') FROM variants WHERE \
    (gene IN ($acmg_genes) ) \
    AND \
    (clinvar_sig LIKE '%pathogenic%' OR impact_severity='HIGH') \
    AND \
    max_aaf_all < 0.005" \
    --gt-filter "(gt_types).(family_id=='$family_id').(!=HOM_REF).(count>=1)" \
    gemini.db |  less -S