ding-lab / CharGer

Characterization of Germline variants
https://ding-lab.github.io/CharGer/
GNU General Public License v3.0
96 stars 37 forks source link

IndexError: list index out of range while running with Mac-Clinvar #43

Closed NTNguyen13 closed 4 years ago

NTNguyen13 commented 4 years ago

Hi, I'm trying out CharGer for prediction and annotation of my vcf file (annotated by VEP), my command is as follow:


charger \
    -f test_charger_vep.vcf \
    -o test_charger_vep2.tsv \
    -l -D \
    --mac-clinvar-tsv ~/clinvar/output/b37/single/clinvar_alleles.single.b37.vcf.gz

But it resulted in error:

charger::getClinVar
Traceback (most recent call last):
  File "~/anaconda3/envs/charger/bin/charger", line 743, in <module>
    main( sys.argv[1:] )
  File "~/anaconda3/envs/charger/bin/charger", line 662, in main
    mutationTypes = mutationTypes , \
  File "~/anaconda3/envs/charger/lib/python2.7/site-packages/charger/charger.py", line 821, in getExternalData
    self.getClinVar( **kwargs )
  File "~/anaconda3/envs/charger/lib/python2.7/site-packages/charger/charger.py", line 842, in getClinVar
    clinvarSet = self.getMacClinVarTSV( macClinVarTSV )
  File "~/anaconda3/envs/charger/lib/python2.7/site-packages/charger/charger.py", line 887, in getMacClinVarTSV
    [ description , status ] = self.parseMacPathogenicity( fields[12:17] )
  File "~/anaconda3/envs/charger/lib/python2.7/site-packages/charger/charger.py", line 909, in parseMacPathogenicity
    named = fields[0]
IndexError: list index out of range

I removed the option --mac-clinvar-tsv and it can run fine. I used the single file from latest Mac Clinvar repository.

Could you please help me on this problem?

P/s: I also want to add the HotSpot3D to CharGer, how would I use https://github.com/ding-lab/hotspot3d to generate cluster for this task? What is my input file to get the cluster?

Thank you very much

ccwang002 commented 4 years ago

--mac-clinvar-tsvaccepts TSV instead of VCF. Please use clinvar_alleles.single.b37.tsv.gz from their repo (hg19).

As for HotSpot3D cluster file, please use this file (hg19) as an example using TCGA pan-cancer mutations (TCGA MC3). I have created a separate issue #44 to add the instructions to generate this HotSpot3D cluster file (or at least point to the right doc).

NTNguyen13 commented 4 years ago

Hi, I have tried it again with the tsv.gz file, it has this error now:

Traceback (most recent call last):
  File "~/anaconda3/envs/charger/bin/charger", line 743, in <module>
    main( sys.argv[1:] )
  File "~/anaconda3/envs/charger/bin/charger", line 662, in main
    mutationTypes = mutationTypes , \
  File "~/anaconda3/envs/charger/lib/python2.7/site-packages/charger/charger.py", line 821, in getExternalData
    self.getClinVar( **kwargs )
  File "~/anaconda3/envs/charger/lib/python2.7/site-packages/charger/charger.py", line 842, in getClinVar
    clinvarSet = self.getMacClinVarTSV( macClinVarTSV )
  File "~/anaconda3/envs/charger/lib/python2.7/site-packages/charger/charger.py", line 887, in getMacClinVarTSV
    [ description , status ] = self.parseMacPathogenicity( fields[12:17] )
  File "~/anaconda3/envs/charger/lib/python2.7/site-packages/charger/charger.py", line 914, in parseMacPathogenicity
    isPathogenic = int( isPathogenic )
ValueError: invalid literal for int() with base 10: 'NM_005101.3:c.62G>A'

I saw similar issue in here: https://github.com/ding-lab/CharGer/issues/4 , I thought that was updated in the latest version, or I need to switch to older version of mac-clinvar?

NTNguyen13 commented 4 years ago

I tried the older version of mac-clinvar, it has a lot of warning like this but charger can still run:

biomine warning: del not found in conversion tables
biomine::variant::mafvariant Warning: could not find amino acid change or intronic change
  Hint: Is the input amino acid change column correct?
    Problem variant:  RAB39B:X:154490187-154490187A>G::NM_171998.3:c.543A>G::NP_741995.1:p.  --  p.Thr181=
biomine::variant::mafvariant Warning: could not find amino acid change or intronic change
  Hint: Is the input amino acid change column correct?
    Problem variant:  RAB39B:X:154490238-154490238C>T::NM_171998.3:c.492C>T::NP_741995.1:p.  --  p.Phe164=
biomine::variant::mafvariant Warning: could not find amino acid change or intronic change
  Hint: Is the input amino acid change column correct?
    Problem variant:  RAB39B:X:154490457-154490457T>C::NM_171998.3:c.273T>C::NP_741995.1:p.  --  p.Ile91=
ccwang002 commented 4 years ago

Looks like the new version is not compatible due to the change in the column order. Please ignore the warnings for now. We are fixing it in the 0.6 version.

Please re-open this issue if there is additional follow-up. Thanks!