ding-lab / CharGer

Characterization of Germline variants
https://ding-lab.github.io/CharGer/
GNU General Public License v3.0
96 stars 37 forks source link

CharGer - v0.5.2: Error running with MacArthur lab clinvar TSV #4

Closed rhshah closed 6 years ago

rhshah commented 6 years ago

Hello,

We are encountering error while running charger with MacArthur lab clinvar.

Here are the details:

charger -m ./maf/APC_vcf_maf.maf -o APC_charger.tsv -H APC_inp.maf.3D_Proximity.pairwise.site.l0.ad10.r10.clusters -D --inheritanceGeneList inheritanceGeneList.txt --exac-vcf ~pyang/.vep/ExAC_nonTCGA.r0.3.1.sites.vep.vcf.gz --mac-clinvar-tsv ./clinvar_alleles.single.b37.tsv.gz

Traceback (most recent call last):
 File "/sonas-hs/nwhgenomics/hpc/home/pyang/software/miniconda2/envs/py27/bin/charger", line 743, in <module>
   main( sys.argv[1:] )
 File "/sonas-hs/nwhgenomics/hpc/home/pyang/software/miniconda2/envs/py27/bin/charger", line 662, in main
   mutationTypes = mutationTypes , \
 File "/sonas-hs/nwhgenomics/hpc/home/pyang/software/miniconda2/envs/py27/lib/python2.7/site-packages/charger/charger.py", line 821, in getExternalData
   self.getClinVar( **kwargs )
 File "/sonas-hs/nwhgenomics/hpc/home/pyang/software/miniconda2/envs/py27/lib/python2.7/site-packages/charger/charger.py", line 842, in getClinVar
   clinvarSet = self.getMacClinVarTSV( macClinVarTSV )
 File "/sonas-hs/nwhgenomics/hpc/home/pyang/software/miniconda2/envs/py27/lib/python2.7/site-packages/charger/charger.py", line 887, in getMacClinVarTSV
   [ description , status ] = self.parseMacPathogenicity( fields[12:17] )
 File "/sonas-hs/nwhgenomics/hpc/home/pyang/software/miniconda2/envs/py27/lib/python2.7/site-packages/charger/charger.py", line 914, in parseMacPathogenicity
   isPathogenic = int( isPathogenic )
ValueError: invalid literal for int() with base 10: 'NM_005101.3:c.62G>A'

After digging a little into the code we see the header mentioned in the code and in our MacArthur Lab TSV

The header listed in the script at line: https://github.com/ding-lab/CharGer/blob/ba34f1dd62a129fd5a44936421edb724c6ad0e4f/charger/charger.py#L876

are different from the file which looks like this:

zcat /sonas-hs/nwhgenomics/hpc/home/pyang/projects/charger/clinvar_alleles.single.b37.tsv.gz | head -n1
chrom   pos ref alt start   stop    strand  variation_type  variation_id    rcv scv allele_id   symbol  hgvs_c  hgvs_p  molecular_consequence   clinical_significance   clinical_significance_ordered   pathogenic  likely_pathogenic   uncertain_significance  likely_benign   benign  review_status   review_status_ordered   last_evaluated  all_submitters  submitters_ordered  all_traits  all_pmids   inheritance_modes   age_of_onset    prevalence  disease_mechanism   origin  xrefs   dates_ordered   gold_stars  conflicted

If we are using the wrong clinvar file can you please point us to the right ones.

Thank you in advance for your guidance and support.

rmashl commented 6 years ago

Hi Ronak, it appears their lab updated file formats. The single-alleles file under commit 5b04ade (https://github.com/macarthur-lab/clinvar/tree/5b04ade4fb4d2f13ffd39e4a8d9ade9af28fdaf9) appears to the final one that should run with the current CharGer. Thanks for bringing this to our attention.

rhshah commented 6 years ago

Thank you @rmashl, do you have plans to support the latest version of the file ?

rmashl commented 6 years ago

Yes.

rhshah commented 6 years ago

Thank you, looking forward to the new update.

rmashl commented 6 years ago

The update now appears in the master branch.