Closed yubau1112 closed 4 years ago
hey @yubau1112, you have created quite a lot of GitHub issues here, and to be honest I still don't know what you are trying to achieve with CharGer. Here is my understanding of what you want to do: you want to run CharGer on your cancer WGS germline VCF and annotate your variants with ClinVar and ExAC.
If that's the case, I would recommend you to follow the steps bellow:
You need to be sure which genome reference you used to generate your germline VCF. If you are using hg19, you should run VEP with GRCh37 cache and use the hg19 version of ClinVar. You cannot mix the genome reference.
Because you are using Demo/demo.vcf
here, it's GRCh37. And I will assume your actual data is using the same genome reference below.
You should have your VEP installed and preferably set up VEP's cache. Here I use VEP v95 as an example, but any version later than that should also work. You should annotate your VCF with the following command:
vep --format vcf --vcf \
--assembly GRCh37 \
--everything --af_exac \
--offline --cache --dir_cache /path/to/vep_cache/ --fasta /path/to/vep_cache/homo_sapiens/95_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa \
--input_file your_wgs_germline.vcf.gz \
--output_file your_wgs_germline.vep95.vcf
The VEP command above will give you a VCF your_wgs_germline.vep95.vcf
with the following information:
/path/to/vep_cache/homo_sapiens/95_GRCh37
--af_exac
flagPlease start with the following and try to add extra options only after it works.
charger \
-f your_wgs_germline.vep95.vcf \
-o your_wgs_germline.charger.tsv \
-l --mac-clinvar-tsv ~/CharGer3/CharGer/Demo/clinvar_alleles.multi.b37.tsv.gz \
-D
You will see some of the ACMG/CharGer modules got disabled because they require additional annotations. But you should be able to successfully run CharGer.
If any of the steps above doesn't work, I will need the following information:
Please stop posting your installation logs, stop trying other CharGer options, and stop creating new issues here unless you get the steps above working. I will comment on your other questions in detail later. Thank you.
Please only consider these options after you successfully run through the three steps above.
The CharGer command above doesn't run all the modules because it lacks the additional annotation. And the additional annotations here depends on your disease, so there is no one annotation for everything and very likely you need to create your own annotations if you are studying a different disease. If you are studying cancer, we have some example pan-cancer annotations under PanCanAtlasData
. The files listed below can be found under that folder.
-z
for all known pathogenic variant in your disease (pan-cancer example: emptyRemoved_20160428_pathogenic_variants_HGVSg_VEP.vcf.gz
)--inheritanceGeneList
for the known inheritance mode of genes in your disease (pan-cancer example: 20160301_Rahman_KJ_KH_gene_table_CharGer.txt.gz
)-H
HotSpot3D cluster file (pan-cancer example: MC3.noHypers.mericUnspecified.d10.r20.v114.clusters.gz
)Please also check out the detailed description of these options (and more) by my colleague at https://github.com/ding-lab/CharGer/issues/18#issuecomment-475979810.
If your VCF is using hg38 genome reference, you need to change all the annotations. It at least affects these parameters:
--mac-clinvar-tsv
should point to clinvar_alleles.single.b38.tsv.gz
-z
pathogenic variant VCF should be emptyRemoved_20160428_pathogenic_variants_HGVSg_VEP_grch38lifOver.vcf
-H
HotSpot3D clusters file should be MC3.noHypers.mericUnspecified.d10.r20.v114.grch38liftOver.clusters
Finally, to answer the rest of your questions.
Why do the extra flags
-l -t -E -x
fail to work?charger -f demo.vcf -o demo.ltEx.tsv -l -t -E -x --exac-vcf ~/CharGer3/CharGer/Demo/ExAC.r1.sites.vep.vcf --mac-clinvar-tsv ~/CharGer3/CharGer/Demo/clinvar_alleles.multi.b37.tsv.gz
These flags enable CharGer to look for the corresponding annotation. It used to try using online API to get those annotation without a local copy (for example, search ExAC online database when --exac-vcf
is not given), but many of those online APIs have changed and are quite buggy to use. So we now recommend to only use the local copy of the annotations.
If you already annotated the input VCF with VEP, many of the flags here are not necessary. Please just follow the 3 steps described above.
somebody found 3 bugs (https://www.jianshu.com/p/544caf92b24c) ...
These bugs and the proposed solutions refer to the online API calling. However, it probably doesn't fix everything like I mentioned above. You will likely run into a new sets of issues talking to all the online APIs (e.g., rate limit, change of API and etc) so we no longer recommend people to use this approach. If you are running CharGer on thousands of VCFs, using the online API will be much slower than have a local annotation copy.
The options of calling online APIs will be removed in the next CharGer release. So the bugs will naturally go away in CharGer v0.6 and later.
What is the format of
-d diseases
?
It's the same format as --inheritanceGeneList
. I think what you need here is actually --inheritanceGeneList
and not -d
. You can find the example file in my comment above.
Ok, thank you so much , I will try your 3 step recommend.
@yubau1112 we haven't heard back from you so I assume you fixed the issue. Please feel free to reopen the issue if you need further help. I am closing it for now.
Thanks!
refer to commit 7d7d291
I remove entire anaconda2 directory
rm ~/anaconda2
mkdir ~/CharGer
cd ~/CharGer
wget https://repo.anaconda.com/archive/Anaconda2-5.2.0-Linux-x86_64.sh
bash Anaconda2-5.2.0-Linux-x86_64.sh
Do you accept the license terms? [yes|no] [no] >>> yes
Anaconda2 will now be installed into this location: /home/yubau/anaconda2
Press ENTER to confirm the location Press CTRL-C to abort the installation Or specify a different location below [/home/yubau/anaconda2] >>>
Do you wish the installer to prepend the Anaconda2 install location to PATH in your /home/yubau/.bashrc ? [yes|no] [no] >>>yes
Do you wish to proceed with the installation of Microsoft VSCode? [yes|no] >>> no
(finish install anaconda)
conda create --name CharGer python=2.7
conda activate CharGer
(CharGer) [yubau@cmuh-i2 ~]$ which pip ~/anaconda2/envs/CharGer/bin/pip (CharGer) [yubau@cmuh-i2 ~]$ pip --version pip 19.3.1 from /home/yubau/anaconda2/envs/CharGer/lib/python2.7/site-packages/pip (python 2.7) (CharGer) [yubau@cmuh-i2 ~]$ conda --version conda 4.5.4 (CharGer) [yubau@cmuh-i2 ~]$ which conda ~/anaconda2/bin/conda
conda install pysam
pip install pysam
(CharGer) [yubau@cmuh-i2 ~]$ conda list # packages in environment at /home/yubau/anaconda2/envs/CharGer: # # Name Version Build Channel _libgcc_mutex 0.1 main AdvancedHTMLParser 9.0.1
BioMine 0.9.5
ca-certificates 2020.1.1 0
certifi 2019.11.28 py27_0
chardet 3.0.4
CharGer 0.5.4
idna 2.9
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 9.1.0 hdf63c60_0
libstdcxx-ng 9.1.0 hdf63c60_0
ncurses 6.2 he6710b0_0
numpy 1.16.6
openssl 1.1.1d h7b6447c_4
pip 19.3.1 py27_0
pysam 0.6 py27_0
pysam 0.15.4
python 2.7.17 h9bab390_0
PyVCF 0.6.8
QueryableList 3.1.0
readline 7.0 h7b6447c_5
requests 2.23.0
scipy 1.2.3
setuptools 44.0.0 py27_0
sqlite 3.31.1 h7b6447c_0
tk 8.6.8 hbc83047_0
urllib3 1.25.8
wheel 0.33.6 py27_0
zlib 1.2.11 h7b6447c_3
(CharGer) [yubau@cmuh-i2 ~]$ pip list DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support Package Version ------------------ ------------------- AdvancedHTMLParser 9.0.1 BioMine 0.9.5 bz2file 0.98 certifi 2019.11.28 chardet 3.0.4 CharGer 0.5.4 idna 2.9 numpy 1.16.6 pip 19.3.1 pysam 0.15.4 PyVCF 0.6.8 QueryableList 3.1.0 requests 2.23.0 scipy 1.2.3 setuptools 44.0.0.post20200106 urllib3 1.25.8 virtualenv 16.4.3 wheel 0.33.6 xopen 0.5.0
cd ~/CharGer
wget -O CharGer.zip https://github.com/ding-lab/CharGer/archive/master.zip
unzip CharGer.zip
mv CharGer-master/ CharGer
cd CharGer
pip install .
(CharGer) [yubau@cmuh-i2 ~]$ which charger ~/anaconda2/envs/CharGer/bin/charger (CharGer) [yubau@cmuh-i2 ~]$ charger CharGer ERROR: Command not recognized
CharGer - v0.5.4 ... .. .
login linux(centos 7) terminal
conda activate CharGer
cd ~/CharGer/CharGer/Demo
charger -f demo.vcf -o demo.tsv
and I got output file
AND I run another parameter for different access data
charger -f demo.vcf -o demo.t.tsv -t
charger -f demo.vcf -o demo.E.tsv -E
charger -f demo.vcf -o demo.x.tsv -x
charger -f demo.vcf -o demo.tEx.tsv -t -E -x
Did not get fatal error
BUT !!
when run:
charger -f demo.vcf -o demo.tsv -l
and I got message
Unsupported VEP version or no gnomAD AF annotation in input file; will search for ExAC frequencies... Unsupported VEP version or no ExAC AF annotation in input file; will search for 1000 Genomes frequencies... Unsupported VEP version or no gnomAD AF annotation in input file; will search for ExAC frequencies... Unsupported VEP version or no ExAC AF annotation in input file; will search for 1000 Genomes frequencies... Skipping: 0 for filters and 0 for AF and 0 for mutation types out of 550 No gene list file uploaded. CharGer will not make PVS1 calls. No PP2 gene list file uploaded. CharGer will not make PP2 calls. No BP1 gene list file uploaded. CharGer will not make BP1 calls. No expression file uploaded. CharGer will allow all passed truncations without expression data in PVS1. charger::getVEP Warning: skipping VEP Running VEP took 2.09808349609e-05seconds charger::getClinVar warning: ClinVar ReST search batch size given is greater than max allowed (50). Overriding to max search batch size. Traceback (most recent call last): File "/home/yubau/anaconda2/envs/CharGer/bin/charger", line 744, in <module> main( sys.argv[1:] ) File "/home/yubau/anaconda2/envs/CharGer/bin/charger", line 663, in main mutationTypes = mutationTypes , \ File "/home/yubau/anaconda2/envs/CharGer/lib/python2.7/site-packages/charger/charger.py", line 879, in getExternalData self.getClinVar( **kwargs ) File "/home/yubau/anaconda2/envs/CharGer/lib/python2.7/site-packages/charger/charger.py", line 905, in getClinVar self.getClinVarviaREST( **kwargs ) File "/home/yubau/anaconda2/envs/CharGer/lib/python2.7/site-packages/charger/charger.py", line 918, in getClinVarviaREST ent = entrezapi() File "/home/yubau/anaconda2/envs/CharGer/lib/python2.7/site-packages/biomine/webapi/entrez/entrezapi.py", line 74, in __init__ self.setRequestLimits() File "/home/yubau/anaconda2/envs/CharGer/lib/python2.7/site-packages/biomine/webapi/entrez/entrezapi.py", line 332, in setRequestLimits self.setSummaryBatchSize( entrezaip.summaryBatchSize ) NameError: global name 'entrezaip' is not defined
AND I got a empty file
AND run:
charger -f demo.vcf -o demo.ltEx.tsv -l -t -E -x
BUT !!
when run:
charger -f demo.vcf -o demo.l.tsv -l --exac-vcf ~/CharGer3/CharGer/Demo/ExAC.r1.sites.vep.vcf --mac-clinvar-tsv ~/CharGer3/CharGer/Demo/clinvar_alleles.multi.b37.tsv.gz
it's can work, why must need to add "--exac-vcf ~/CharGer3/CharGer/Demo/ExAC.r1.sites.vep.vcf --mac-clinvar-tsv ~/CharGer3/CharGer/Demo/clinvar_alleles.multi.b37.tsv.gz" for access ClinVar data?
AND run:
charger -f demo.vcf -o demo.ltEx.tsv -l -t -E -x --exac-vcf ~/CharGer3/CharGer/Demo/ExAC.r1.sites.vep.vcf --mac-clinvar-tsv ~/CharGer3/CharGer/Demo/clinvar_alleles.multi.b37.tsv.gz
Why?
somebody found 3 bug (https://www.jianshu.com/p/544caf92b24c)
One of 3 bugs, somebody say following file need to modify: /home/yubau/anaconda2/envs/CharGer/lib/python2.7/site-packages/biomine/webapi/entrez/entrezapi.py
Line 332 and Line 333 entrezaip ---need to modify---> entrezapi
it's really ?
(CharGer) [yubau@cmuh-i2 Demo]$ which python ~/anaconda2/envs/CharGer/bin/python (CharGer) [yubau@cmuh-i2 Demo]$ python -V Python 2.7.17 :: Anaconda, Inc.
less ~/.bashrc
# added by Anaconda2 installer export PATH="/home/yubau/anaconda2/bin:$PATH" export PATH="/home/yubau/anaconda2/envs/CharGer/bin:$PATH"
source "/home/yubau/anaconda2/etc/profile.d/conda.sh"
AND where is "diseases file" or "gene\tdisease\tmode_of_inheritance.tsv"?
Access data -l ClinVar (flag) -x ExAC (flag) -E VEP (flag) -t TCGA cancer types (flag) Using these flags turns on accession features built in. For the ClinVar, ExAC, and VEP flags, if no local VEP or database is provided, then BioMine will be used to access the ReST interface. CharGer is currently capable of handling all VEP releases up until release 97. [[[[[The TCGA flag allows disease determination from sample barcodes in a .maf when using a diseases file (see below).]]]]]
Cross-reference data files -z pathogenic variants, .vcf -e expression matrix file, .tsv --inheritanceGeneList inheritance gene list file, (format: gene\tdisease\tmode_of_inheritance) .txt --PP2GeneList PP2 gene list file, (format: column of genes) .txt --BP1GeneList BP1 gene list file, (format: column of genes) .txt [[[[[ -d diseases file, (format: gene\tdisease\tmode_of_inheritance) .tsv]]]]] -n de novo file, standard .maf -a assumed de novo file, standard .maf -c co-segregation file, standard .maf -H HotSpot3D clusters file, .clusters
I am not found diseases file. Can you upload a "diseases file" or "gene\tdisease\tmode_of_inheritance.tsv"? or tell me where the file, thanks
I need to run charger, because hole my team is waiting for run charger. we have two thousand whole genome sequencing vcf file and hundreds of cancer panel vcf file waiting for run charger, especially access ClinVar data.
please reply to me, thanks so so much.
yubau, from taiwan