ding-lab / CharGer

Characterization of Germline variants
https://ding-lab.github.io/CharGer/
GNU General Public License v3.0
97 stars 37 forks source link

Problems to run CharGer v0.5.4 #42

Closed yubau1112 closed 4 years ago

yubau1112 commented 4 years ago

refer to commit 7d7d291

I remove entire anaconda2 directory

rm ~/anaconda2

mkdir ~/CharGer cd ~/CharGer wget https://repo.anaconda.com/archive/Anaconda2-5.2.0-Linux-x86_64.sh bash Anaconda2-5.2.0-Linux-x86_64.sh

Do you accept the license terms? [yes|no] [no] >>> yes

Anaconda2 will now be installed into this location: /home/yubau/anaconda2

Press ENTER to confirm the location Press CTRL-C to abort the installation Or specify a different location below [/home/yubau/anaconda2] >>>

Do you wish the installer to prepend the Anaconda2 install location to PATH in your /home/yubau/.bashrc ? [yes|no] [no] >>>yes

Do you wish to proceed with the installation of Microsoft VSCode? [yes|no] >>> no

(finish install anaconda)

conda create --name CharGer python=2.7 conda activate CharGer

(CharGer) [yubau@cmuh-i2 ~]$ which pip ~/anaconda2/envs/CharGer/bin/pip (CharGer) [yubau@cmuh-i2 ~]$ pip --version pip 19.3.1 from /home/yubau/anaconda2/envs/CharGer/lib/python2.7/site-packages/pip (python 2.7) (CharGer) [yubau@cmuh-i2 ~]$ conda --version conda 4.5.4 (CharGer) [yubau@cmuh-i2 ~]$ which conda ~/anaconda2/bin/conda

conda install pysam pip install pysam

(CharGer) [yubau@cmuh-i2 ~]$ conda list # packages in environment at /home/yubau/anaconda2/envs/CharGer: # # Name Version Build Channel _libgcc_mutex 0.1 main AdvancedHTMLParser 9.0.1 BioMine 0.9.5 ca-certificates 2020.1.1 0 certifi 2019.11.28 py27_0 chardet 3.0.4 CharGer 0.5.4 idna 2.9 libedit 3.1.20181209 hc058e9b_0 libffi 3.2.1 hd88cf55_4 libgcc-ng 9.1.0 hdf63c60_0 libstdcxx-ng 9.1.0 hdf63c60_0 ncurses 6.2 he6710b0_0 numpy 1.16.6 openssl 1.1.1d h7b6447c_4 pip 19.3.1 py27_0 pysam 0.6 py27_0 pysam 0.15.4 python 2.7.17 h9bab390_0 PyVCF 0.6.8 QueryableList 3.1.0 readline 7.0 h7b6447c_5 requests 2.23.0 scipy 1.2.3 setuptools 44.0.0 py27_0 sqlite 3.31.1 h7b6447c_0 tk 8.6.8 hbc83047_0 urllib3 1.25.8 wheel 0.33.6 py27_0 zlib 1.2.11 h7b6447c_3

(CharGer) [yubau@cmuh-i2 ~]$ pip list DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support Package Version ------------------ ------------------- AdvancedHTMLParser 9.0.1 BioMine 0.9.5 bz2file 0.98 certifi 2019.11.28 chardet 3.0.4 CharGer 0.5.4 idna 2.9 numpy 1.16.6 pip 19.3.1 pysam 0.15.4 PyVCF 0.6.8 QueryableList 3.1.0 requests 2.23.0 scipy 1.2.3 setuptools 44.0.0.post20200106 urllib3 1.25.8 virtualenv 16.4.3 wheel 0.33.6 xopen 0.5.0

cd ~/CharGer wget -O CharGer.zip https://github.com/ding-lab/CharGer/archive/master.zip unzip CharGer.zip mv CharGer-master/ CharGer cd CharGer pip install .

(CharGer) [yubau@cmuh-i2 ~]$ which charger ~/anaconda2/envs/CharGer/bin/charger (CharGer) [yubau@cmuh-i2 ~]$ charger CharGer ERROR: Command not recognized

CharGer - v0.5.4 ... .. .

login linux(centos 7) terminal

conda activate CharGer cd ~/CharGer/CharGer/Demo charger -f demo.vcf -o demo.tsv

image

and I got output file

image

AND I run another parameter for different access data charger -f demo.vcf -o demo.t.tsv -t charger -f demo.vcf -o demo.E.tsv -E charger -f demo.vcf -o demo.x.tsv -x charger -f demo.vcf -o demo.tEx.tsv -t -E -x

Did not get fatal error

BUT !!

when run:

charger -f demo.vcf -o demo.tsv -l

and I got message

image

Unsupported VEP version or no gnomAD AF annotation in input file; will search for ExAC frequencies... Unsupported VEP version or no ExAC AF annotation in input file; will search for 1000 Genomes frequencies... Unsupported VEP version or no gnomAD AF annotation in input file; will search for ExAC frequencies... Unsupported VEP version or no ExAC AF annotation in input file; will search for 1000 Genomes frequencies... Skipping: 0 for filters and 0 for AF and 0 for mutation types out of 550 No gene list file uploaded. CharGer will not make PVS1 calls. No PP2 gene list file uploaded. CharGer will not make PP2 calls. No BP1 gene list file uploaded. CharGer will not make BP1 calls. No expression file uploaded. CharGer will allow all passed truncations without expression data in PVS1. charger::getVEP Warning: skipping VEP Running VEP took 2.09808349609e-05seconds charger::getClinVar warning: ClinVar ReST search batch size given is greater than max allowed (50). Overriding to max search batch size. Traceback (most recent call last): File "/home/yubau/anaconda2/envs/CharGer/bin/charger", line 744, in <module> main( sys.argv[1:] ) File "/home/yubau/anaconda2/envs/CharGer/bin/charger", line 663, in main mutationTypes = mutationTypes , \ File "/home/yubau/anaconda2/envs/CharGer/lib/python2.7/site-packages/charger/charger.py", line 879, in getExternalData self.getClinVar( **kwargs ) File "/home/yubau/anaconda2/envs/CharGer/lib/python2.7/site-packages/charger/charger.py", line 905, in getClinVar self.getClinVarviaREST( **kwargs ) File "/home/yubau/anaconda2/envs/CharGer/lib/python2.7/site-packages/charger/charger.py", line 918, in getClinVarviaREST ent = entrezapi() File "/home/yubau/anaconda2/envs/CharGer/lib/python2.7/site-packages/biomine/webapi/entrez/entrezapi.py", line 74, in __init__ self.setRequestLimits() File "/home/yubau/anaconda2/envs/CharGer/lib/python2.7/site-packages/biomine/webapi/entrez/entrezapi.py", line 332, in setRequestLimits self.setSummaryBatchSize( entrezaip.summaryBatchSize ) NameError: global name 'entrezaip' is not defined

AND I got a empty file

AND run: charger -f demo.vcf -o demo.ltEx.tsv -l -t -E -x

image

BUT !!

when run: charger -f demo.vcf -o demo.l.tsv -l --exac-vcf ~/CharGer3/CharGer/Demo/ExAC.r1.sites.vep.vcf --mac-clinvar-tsv ~/CharGer3/CharGer/Demo/clinvar_alleles.multi.b37.tsv.gz

image

it's can work, why must need to add "--exac-vcf ~/CharGer3/CharGer/Demo/ExAC.r1.sites.vep.vcf --mac-clinvar-tsv ~/CharGer3/CharGer/Demo/clinvar_alleles.multi.b37.tsv.gz" for access ClinVar data?

AND run: charger -f demo.vcf -o demo.ltEx.tsv -l -t -E -x --exac-vcf ~/CharGer3/CharGer/Demo/ExAC.r1.sites.vep.vcf --mac-clinvar-tsv ~/CharGer3/CharGer/Demo/clinvar_alleles.multi.b37.tsv.gz

image

Why?

somebody found 3 bug (https://www.jianshu.com/p/544caf92b24c)

One of 3 bugs, somebody say following file need to modify: /home/yubau/anaconda2/envs/CharGer/lib/python2.7/site-packages/biomine/webapi/entrez/entrezapi.py

Line 332 and Line 333 entrezaip ---need to modify---> entrezapi

it's really ?

(CharGer) [yubau@cmuh-i2 Demo]$ which python ~/anaconda2/envs/CharGer/bin/python (CharGer) [yubau@cmuh-i2 Demo]$ python -V Python 2.7.17 :: Anaconda, Inc.

less ~/.bashrc

# added by Anaconda2 installer export PATH="/home/yubau/anaconda2/bin:$PATH" export PATH="/home/yubau/anaconda2/envs/CharGer/bin:$PATH"

source "/home/yubau/anaconda2/etc/profile.d/conda.sh"

AND where is "diseases file" or "gene\tdisease\tmode_of_inheritance.tsv"?

Access data -l ClinVar (flag) -x ExAC (flag) -E VEP (flag) -t TCGA cancer types (flag) Using these flags turns on accession features built in. For the ClinVar, ExAC, and VEP flags, if no local VEP or database is provided, then BioMine will be used to access the ReST interface. CharGer is currently capable of handling all VEP releases up until release 97. [[[[[The TCGA flag allows disease determination from sample barcodes in a .maf when using a diseases file (see below).]]]]]

Cross-reference data files -z pathogenic variants, .vcf -e expression matrix file, .tsv --inheritanceGeneList inheritance gene list file, (format: gene\tdisease\tmode_of_inheritance) .txt --PP2GeneList PP2 gene list file, (format: column of genes) .txt --BP1GeneList BP1 gene list file, (format: column of genes) .txt [[[[[ -d diseases file, (format: gene\tdisease\tmode_of_inheritance) .tsv]]]]] -n de novo file, standard .maf -a assumed de novo file, standard .maf -c co-segregation file, standard .maf -H HotSpot3D clusters file, .clusters

I am not found diseases file. Can you upload a "diseases file" or "gene\tdisease\tmode_of_inheritance.tsv"? or tell me where the file, thanks

I need to run charger, because hole my team is waiting for run charger. we have two thousand whole genome sequencing vcf file and hundreds of cancer panel vcf file waiting for run charger, especially access ClinVar data.

please reply to me, thanks so so much.

yubau, from taiwan

ccwang002 commented 4 years ago

hey @yubau1112, you have created quite a lot of GitHub issues here, and to be honest I still don't know what you are trying to achieve with CharGer. Here is my understanding of what you want to do: you want to run CharGer on your cancer WGS germline VCF and annotate your variants with ClinVar and ExAC.

If that's the case, I would recommend you to follow the steps bellow:

  1. Determine the human genome reference version you are using (hg19/GRCh37 or hg38/GRCh38)
  2. Annotate your VCF using VEP
  3. Run CharGer

Step 1: Detemine the human genome reference version

You need to be sure which genome reference you used to generate your germline VCF. If you are using hg19, you should run VEP with GRCh37 cache and use the hg19 version of ClinVar. You cannot mix the genome reference.

Because you are using Demo/demo.vcf here, it's GRCh37. And I will assume your actual data is using the same genome reference below.

Step 2: Annotate your VCF using VEP

You should have your VEP installed and preferably set up VEP's cache. Here I use VEP v95 as an example, but any version later than that should also work. You should annotate your VCF with the following command:

vep --format vcf --vcf \
    --assembly GRCh37 \
    --everything --af_exac \
    --offline --cache --dir_cache /path/to/vep_cache/ --fasta /path/to/vep_cache/homo_sapiens/95_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa \
    --input_file your_wgs_germline.vcf.gz \
    --output_file your_wgs_germline.vep95.vcf 

The VEP command above will give you a VCF your_wgs_germline.vep95.vcf with the following information:

Step 3: Run CharGer

Please start with the following and try to add extra options only after it works.

charger \
    -f your_wgs_germline.vep95.vcf \
    -o your_wgs_germline.charger.tsv \
    -l --mac-clinvar-tsv ~/CharGer3/CharGer/Demo/clinvar_alleles.multi.b37.tsv.gz \
    -D 

You will see some of the ACMG/CharGer modules got disabled because they require additional annotations. But you should be able to successfully run CharGer.

If any of the steps above doesn't work, I will need the following information:

Please stop posting your installation logs, stop trying other CharGer options, and stop creating new issues here unless you get the steps above working. I will comment on your other questions in detail later. Thank you.

ccwang002 commented 4 years ago

Adding extra CharGer options

Please only consider these options after you successfully run through the three steps above.

The CharGer command above doesn't run all the modules because it lacks the additional annotation. And the additional annotations here depends on your disease, so there is no one annotation for everything and very likely you need to create your own annotations if you are studying a different disease. If you are studying cancer, we have some example pan-cancer annotations under PanCanAtlasData. The files listed below can be found under that folder.

Please also check out the detailed description of these options (and more) by my colleague at https://github.com/ding-lab/CharGer/issues/18#issuecomment-475979810.

Using hg38 genome reference

If your VCF is using hg38 genome reference, you need to change all the annotations. It at least affects these parameters:

ccwang002 commented 4 years ago

Finally, to answer the rest of your questions.

Why do the extra flags -l -t -E -x fail to work? charger -f demo.vcf -o demo.ltEx.tsv -l -t -E -x --exac-vcf ~/CharGer3/CharGer/Demo/ExAC.r1.sites.vep.vcf --mac-clinvar-tsv ~/CharGer3/CharGer/Demo/clinvar_alleles.multi.b37.tsv.gz

These flags enable CharGer to look for the corresponding annotation. It used to try using online API to get those annotation without a local copy (for example, search ExAC online database when --exac-vcf is not given), but many of those online APIs have changed and are quite buggy to use. So we now recommend to only use the local copy of the annotations.

If you already annotated the input VCF with VEP, many of the flags here are not necessary. Please just follow the 3 steps described above.

somebody found 3 bugs (https://www.jianshu.com/p/544caf92b24c) ...

These bugs and the proposed solutions refer to the online API calling. However, it probably doesn't fix everything like I mentioned above. You will likely run into a new sets of issues talking to all the online APIs (e.g., rate limit, change of API and etc) so we no longer recommend people to use this approach. If you are running CharGer on thousands of VCFs, using the online API will be much slower than have a local annotation copy.

The options of calling online APIs will be removed in the next CharGer release. So the bugs will naturally go away in CharGer v0.6 and later.

What is the format of -d diseases?

It's the same format as --inheritanceGeneList. I think what you need here is actually --inheritanceGeneList and not -d. You can find the example file in my comment above.

yubau1112 commented 4 years ago

Ok, thank you so much , I will try your 3 step recommend.

fernanda-rodrigues commented 4 years ago

@yubau1112 we haven't heard back from you so I assume you fixed the issue. Please feel free to reopen the issue if you need further help. I am closing it for now.

Thanks!