WGLab / InterVar

A bioinformatics software tool for clinical interpretation of genetic variants by the 2015 ACMG-AMP guideline
188 stars 93 forks source link

Taking too long to run on exome #26

Open gpcr opened 6 years ago

gpcr commented 6 years ago

It is taking around 4-5 hours to run on one exome VCF of around 90K variants (running on Xeon 32 core processor + 120Gb Ram workstation) Is it normal run time you expect?

kaichop commented 6 years ago

should be just a couple of minutes.

On Wed, Feb 28, 2018 at 8:39 AM, gpcr notifications@github.com wrote:

It is taking around 4-5 hours to run on one exome VCF of around 90K variants (running on Xeon 32 core processor + 120Gb Ram workstation) Is it normal run time you expect?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/InterVar/issues/26, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuFQE8umHTGnMz2P-gcSv9UrbKV9Gks5tZVcSgaJpZM4SWqnh .

quanliustc commented 6 years ago

I tested on Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz with 64 G memory, for a about 2.5M variants, when using single core, if annovar annotation results were ready, Intervar only need ~10 minutes to get the results. If need to run annovar to get annotation files firstly, you will take about ~40 minutes to get the final results. So 4-5 hours, seems a little slow.

gpcr commented 6 years ago

filtering step of dbnsfp is taking most of the time "NOTICE: Scanning filter database /annovar/humandb/hg38_dbnsfp33a.txt..."

kaichop commented 6 years ago

You need to show complete message. There is a possibility that you did not actually download the index file, or your dbnsfp33a file is corrupt, but without the complete message we cannot tell. You never mentioned that you are counting the time to run annovar, which is an optional step before running intervar.

On Wed, Feb 28, 2018 at 11:30 AM, gpcr notifications@github.com wrote:

filtering step of dbnsfp is taking most of the time "NOTICE: Scanning filter database /annovar/humandb/hg38_dbnsfp33a.txt..."

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/InterVar/issues/26#issuecomment-369296929, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuKzTwuObS_ZT-tUoI8MDsFzgIkjNks5tZX83gaJpZM4SWqnh .

gpcr commented 6 years ago

here it is...


$ time ./Intervar.py -i SN_pt123456.vcf -o SN_pt123456 --input_type=VCF
=============================================================================
InterVar
Interpretation of Pathogenic/Benign for variants using python scripts of InterVar.
=============================================================================

%prog 0.1.7 20180118
Written by Quan LI,leequan@gmail.com.
InterVar is free for non-commercial use without warranty.
Please contact the authors for commercial use.
Copyright (C) 2016 Wang Genomic Lab
============================================================================

Notice: Your command of InterVar is ['./Intervar.py', '-i', 'SN_pt123456.vcf', '-o', 'SN_pt123456', '--input_type=VCF']
INFO: The options are {'pp2_genes': '/home/user1/git/InterVar/intervardb/PP2.genes.hg38', 'inputfile': 'SN_pt123456.vcf', 'exclude_snps': '/home/user1/git/InterVar/intervardb/ext.variants.hg38', 'annotate_variation': '/home/user1/git/annovar/annotate_variation.pl', 'skip_annovar': False, 'ps4_snps': '/home/user1/git/InterVar/intervardb/PS4.variants.hg38', 'mim_domin': '/home/user1/git/InterVar/intervardb/mim_domin.txt', 'current_version': 'Intervar_20180118', 'bs2_snps': '/home/user1/git/InterVar/intervardb/BS2_hom_het.hg38', 'evidence_file': 'None', 'public_dev': 'https://github.com/WGLab/InterVar/releases', 'otherinfo': 'TRUE', 'database_names': 'refGene esp6500siv2_all 1000g2015aug avsnp147 dbnsfp33a clinvar_20170905 exac03 dbscsnv11 dbnsfp31a_interpro rmsk ensGene knownGene', 'mim_pheno': '/home/user1/git/InterVar/intervardb/mim_pheno.txt', 'table_annovar': '/home/user1/git/annovar/table_annovar.pl', 'buildver': 'hg38', 'inputfile_type': 'VCF', 'onetranscript': 'FALSE', 'mim2gene': '/home/user1/git/InterVar/intervardb/mim2gene.txt', 'orpha': '/home/user1/git/InterVar/intervardb/orpha.txt', 'ps1_aa': '/home/user1/git/InterVar/intervardb/PS1.AA.change.patho.hg38', 'mim_adultonset': '/home/user1/git/InterVar/intervardb/mim_adultonset.txt', 'knowngenecanonical': '/home/user1/git/InterVar/intervardb/knownGeneCanonical.txt.hg38', 'outfile': 'SN_pt123456', 'convert2annovar': '/home/user1/git/annovar/convert2annovar.pl', 'database_locat': '/home/user1/git/annovar/humandb', 'database_intervar': '/home/user1/git/InterVar/intervardb', 'lof_genes': '/home/user1/git/InterVar/intervardb/PVS1.LOF.genes.hg38', 'disorder_cutoff': '0.01', 'mim_recessive': '/home/user1/git/InterVar/intervardb/mim_recessive.txt', 'pm1_domain': '/home/user1/git/InterVar/intervardb/PM1_domains_with_benigns.hg38', 'mim_orpha': '/home/user1/git/InterVar/intervardb/mim_orpha.txt', 'bp1_genes': '/home/user1/git/InterVar/intervardb/BP1.genes.hg38'}
Warning: the folder of /home/user1/git/annovar/humandb is already created!
Warning: Begin to convert your vcf file of SN_pt123456.vcf to AVinput of Annovar ...
perl /home/user1/git/annovar/convert2annovar.pl -format vcf4 SN_pt123456.vcf> SN_pt123456.vcf.avinput
NOTICE: Finished reading 82958 lines from VCF file
NOTICE: A total of 80269 locus in VCF file passed QC threshold, representing 65312 SNPs (45587 transitions and 19725 transversions) and 9807 indels/substitutions
NOTICE: Finished writing 65312 SNP genotypes (45587 transitions and 19725 transversions) and 9807 indels/substitutions for 1 sample
WARNING: 5596 invalid alternative alleles found in input file
perl /home/user1/git/annovar/table_annovar.pl SN_pt123456.vcf.avinput /home/user1/git/annovar/humandb -buildver hg38 -remove -out SN_pt123456 -protocol refGene,esp6500siv2_all,1000g2015aug_all,avsnp147,dbnsfp33a,clinvar_20170905,exac03,dbscsnv11,dbnsfp31a_interpro,rmsk,ensGene,knownGene   -operation  g,f,f,f,f,f,f,f,f,r,g,g   -nastring . --otherinfo
-----------------------------------------------------------------
NOTICE: Processing operation=g protocol=refGene

NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg38 -dbtype refGene -outfile SN_pt123456.refGene -exonsort SN_pt123456.vcf.avinput /home/user1/git/annovar/humandb>
NOTICE: Output files were written to SN_pt123456.refGene.variant_function, SN_pt123456.refGene.exonic_variant_function
NOTICE: Reading gene annotation from /home/user1/git/annovar/humandb/hg38_refGene.txt ... Done with 71041 transcripts (including 17412 without coding sequence annotation) for 27813 unique genes
NOTICE: Processing next batch with 80715 unique variants in 80715 input lines
NOTICE: Reading FASTA sequences from /home/user1/git/annovar/humandb/hg38_refGeneMrna.fa ... Done with 21447 sequences
WARNING: A total of 515 sequences will be ignored due to lack of correct ORF annotation
-----------------------------------------------------------------
NOTICE: Processing operation=f protocol=esp6500siv2_all

NOTICE: Running system command <annotate_variation.pl -filter -dbtype esp6500siv2_all -buildver hg38 -outfile SN_pt123456 SN_pt123456.vcf.avinput /home/user1/git/annovar/humandb>
NOTICE: the --dbtype esp6500siv2_all is assumed to be in generic ANNOVAR database format
NOTICE: Variants matching filtering criteria are written to SN_pt123456.hg38_esp6500siv2_all_dropped, other variants are written to SN_pt123456.hg38_esp6500siv2_all_filtered
NOTICE: Processing next batch with 80715 unique variants in 80715 input lines
NOTICE: Database index loaded. Total number of bins is 683825 and the number of bins to be scanned is 51610
NOTICE: Scanning filter database /home/user1/git/annovar/humandb/hg38_esp6500siv2_all.txt...Done
-----------------------------------------------------------------
NOTICE: Processing operation=f protocol=1000g2015aug_all

NOTICE: Running system command <annotate_variation.pl -filter -dbtype 1000g2015aug_all -buildver hg38 -outfile SN_pt123456 SN_pt123456.vcf.avinput /home/user1/git/annovar/humandb>
NOTICE: Variants matching filtering criteria are written to SN_pt123456.hg38_ALL.sites.2015_08_dropped, other variants are written to SN_pt123456.hg38_ALL.sites.2015_08_filtered
NOTICE: Processing next batch with 80715 unique variants in 80715 input lines
NOTICE: Database index loaded. Total number of bins is 2821635 and the number of bins to be scanned is 50667
NOTICE: Scanning filter database /home/user1/git/annovar/humandb/hg38_ALL.sites.2015_08.txt...Done
-----------------------------------------------------------------
NOTICE: Processing operation=f protocol=avsnp147

NOTICE: Running system command <annotate_variation.pl -filter -dbtype avsnp147 -buildver hg38 -outfile SN_pt123456 SN_pt123456.vcf.avinput /home/user1/git/annovar/humandb>
NOTICE: Variants matching filtering criteria are written to SN_pt123456.hg38_avsnp147_dropped, other variants are written to SN_pt123456.hg38_avsnp147_filtered
NOTICE: Processing next batch with 80715 unique variants in 80715 input lines
NOTICE: Database index loaded. Total number of bins is 27843692 and the number of bins to be scanned is 64611
NOTICE: Scanning filter database /home/user1/git/annovar/humandb/hg38_avsnp147.txt...Done
-----------------------------------------------------------------
NOTICE: Processing operation=f protocol=dbnsfp33a
NOTICE: Finished reading 66 column headers for '-dbtype dbnsfp33a'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype dbnsfp33a -buildver hg38 -outfile SN_pt123456 SN_pt123456.vcf.avinput /home/user1/git/annovar/humandb -otherinfo>
NOTICE: the --dbtype dbnsfp33a is assumed to be in generic ANNOVAR database format
NOTICE: Variants matching filtering criteria are written to SN_pt123456.hg38_dbnsfp33a_dropped, other variants are written to SN_pt123456.hg38_dbnsfp33a_filtered
NOTICE: Processing next batch with 80715 unique variants in 80715 input lines
NOTICE: Database index loaded. Total number of bins is 552168 and the number of bins to be scanned is 41710
NOTICE: Scanning filter database /home/user1/git/annovar/humandb/hg38_dbnsfp33a.txt...
kaichop commented 6 years ago

Looks normal to me. Note that this is annovar, not intervar. It is possible that your disk I/O is the limitation, try to use a SSD drive for this type of annotation. With multi-threading, it should take a couple of minutes for a typical exome.

On Wed, Feb 28, 2018 at 11:59 AM, gpcr notifications@github.com wrote:

here it is...

$ time ./Intervar.py -i SN_pt123456.vcf -o SN_pt123456 --input_type=VCF

InterVar Interpretation of Pathogenic/Benign for variants using python scripts of InterVar.

%prog 0.1.7 20180118 Written by Quan LI,leequan@gmail.com. InterVar is free for non-commercial use without warranty. Please contact the authors for commercial use. Copyright (C) 2016 Wang Genomic Lab

Notice: Your command of InterVar is ['./Intervar.py', '-i', 'SN_pt123456.vcf', '-o', 'SN_pt123456', '--input_type=VCF'] INFO: The options are {'pp2_genes': '/home/user1/git/InterVar/intervardb/PP2.genes.hg38', 'inputfile': 'SN_pt123456.vcf', 'exclude_snps': '/home/user1/git/InterVar/intervardb/ext.variants.hg38', 'annotate_variation': '/home/user1/git/annovar/annotate_variation.pl', 'skip_annovar': False, 'ps4_snps': '/home/user1/git/InterVar/intervardb/PS4.variants.hg38', 'mim_domin': '/home/user1/git/InterVar/intervardb/mim_domin.txt', 'current_version': 'Intervar_20180118', 'bs2_snps': '/home/user1/git/InterVar/intervardb/BS2_hom_het.hg38', 'evidence_file': 'None', 'public_dev': 'https://github.com/WGLab/InterVar/releases', 'otherinfo': 'TRUE', 'database_names': 'refGene esp6500siv2_all 1000g2015aug avsnp147 dbnsfp33a clinvar_20170905 exac03 dbscsnv11 dbnsfp31a_interpro rmsk ensGene knownGene', 'mim_pheno': '/home/user1/git/InterVar/intervardb/mim_pheno.txt', 'table_annovar': '/home/user1/git/annovar/table_annovar.pl', 'buildver': 'hg38', 'inputfile_type': 'VCF', 'onetranscript': 'FALSE', 'mim2gene': '/home/user1/git/InterVar/intervardb/mim2gene.txt', 'orpha': '/home/user1/git/InterVar/intervardb/orpha.txt', 'ps1_aa': '/home/user1/git/InterVar/intervardb/PS1.AA.change.patho.hg38', 'mim_adultonset': '/home/user1/git/InterVar/intervardb/mim_adultonset.txt', 'knowngenecanonical': '/home/user1/git/InterVar/intervardb/knownGeneCanonical.txt.hg38', 'outfile': 'SN_pt123456', 'convert2annovar': '/home/user1/git/annovar/convert2annovar.pl', 'database_locat': '/home/user1/git/annovar/humandb', 'database_intervar': '/home/user1/git/InterVar/intervardb', 'lof_genes': '/home/user1/git/InterVar/intervardb/PVS1.LOF.genes.hg38', 'disorder_cutoff': '0.01', 'mim_recessive': '/home/user1/git/InterVar/intervardb/mim_recessive.txt', 'pm1_domain': '/home/user1/git/InterVar/intervardb/PM1_domains_with_benigns.hg38', 'mim_orpha': '/home/user1/git/InterVar/intervardb/mim_orpha.txt', 'bp1_genes': '/home/user1/git/InterVar/intervardb/BP1.genes.hg38'} Warning: the folder of /home/user1/git/annovar/humandb is already created! Warning: Begin to convert your vcf file of SN_pt123456.vcf to AVinput of Annovar ... perl /home/user1/git/annovar/convert2annovar.pl -format vcf4 SN_pt123456.vcf> SN_pt123456.vcf.avinput NOTICE: Finished reading 82958 lines from VCF file NOTICE: A total of 80269 locus in VCF file passed QC threshold, representing 65312 SNPs (45587 transitions and 19725 transversions) and 9807 indels/substitutions NOTICE: Finished writing 65312 SNP genotypes (45587 transitions and 19725 transversions) and 9807 indels/substitutions for 1 sample WARNING: 5596 invalid alternative alleles found in input file perl /home/user1/git/annovar/table_annovar.pl SN_pt123456.vcf.avinput /home/user1/git/annovar/humandb -buildver hg38 -remove -out SN_pt123456 -protocol refGene,esp6500siv2_all,1000g2015aug_all,avsnp147,dbnsfp33a,clinvar_20170905,exac03,dbscsnv11,dbnsfp31a_interpro,rmsk,ensGene,knownGene -operation g,f,f,f,f,f,f,f,f,r,g,g -nastring . --otherinfo

NOTICE: Processing operation=g protocol=refGene

NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg38 -dbtype refGene -outfile SN_pt123456.refGene -exonsort SN_pt123456.vcf.avinput /home/user1/git/annovar/humandb> NOTICE: Output files were written to SN_pt123456.refGene.variant_function, SN_pt123456.refGene.exonic_variant_function NOTICE: Reading gene annotation from /home/user1/git/annovar/humandb/hg38_refGene.txt ... Done with 71041 transcripts (including 17412 without coding sequence annotation) for 27813 unique genes NOTICE: Processing next batch with 80715 unique variants in 80715 input lines NOTICE: Reading FASTA sequences from /home/user1/git/annovar/humandb/hg38_refGeneMrna.fa ... Done with 21447 sequences WARNING: A total of 515 sequences will be ignored due to lack of correct ORF annotation

NOTICE: Processing operation=f protocol=esp6500siv2_all

NOTICE: Running system command <annotate_variation.pl -filter -dbtype esp6500siv2_all -buildver hg38 -outfile SN_pt123456 SN_pt123456.vcf.avinput /home/user1/git/annovar/humandb> NOTICE: the --dbtype esp6500siv2_all is assumed to be in generic ANNOVAR database format NOTICE: Variants matching filtering criteria are written to SN_pt123456.hg38_esp6500siv2_all_dropped, other variants are written to SN_pt123456.hg38_esp6500siv2_all_filtered NOTICE: Processing next batch with 80715 unique variants in 80715 input lines NOTICE: Database index loaded. Total number of bins is 683825 and the number of bins to be scanned is 51610 NOTICE: Scanning filter database /home/user1/git/annovar/humandb/hg38_esp6500siv2_all.txt...Done

NOTICE: Processing operation=f protocol=1000g2015aug_all

NOTICE: Running system command <annotate_variation.pl -filter -dbtype 1000g2015aug_all -buildver hg38 -outfile SN_pt123456 SN_pt123456.vcf.avinput /home/user1/git/annovar/humandb> NOTICE: Variants matching filtering criteria are written to SN_pt123456.hg38_ALL.sites.2015_08_dropped, other variants are written to SN_pt123456.hg38_ALL.sites.2015_08_filtered NOTICE: Processing next batch with 80715 unique variants in 80715 input lines NOTICE: Database index loaded. Total number of bins is 2821635 and the number of bins to be scanned is 50667 NOTICE: Scanning filter database /home/user1/git/annovar/humandb/hg38_ALL.sites.2015_08.txt...Done

NOTICE: Processing operation=f protocol=avsnp147

NOTICE: Running system command <annotate_variation.pl -filter -dbtype avsnp147 -buildver hg38 -outfile SN_pt123456 SN_pt123456.vcf.avinput /home/user1/git/annovar/humandb> NOTICE: Variants matching filtering criteria are written to SN_pt123456.hg38_avsnp147_dropped, other variants are written to SN_pt123456.hg38_avsnp147_filtered NOTICE: Processing next batch with 80715 unique variants in 80715 input lines NOTICE: Database index loaded. Total number of bins is 27843692 and the number of bins to be scanned is 64611 NOTICE: Scanning filter database /home/user1/git/annovar/humandb/hg38_avsnp147.txt...Done

NOTICE: Processing operation=f protocol=dbnsfp33a NOTICE: Finished reading 66 column headers for '-dbtype dbnsfp33a'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype dbnsfp33a -buildver hg38 -outfile SN_pt123456 SN_pt123456.vcf.avinput /home/user1/git/annovar/humandb -otherinfo> NOTICE: the --dbtype dbnsfp33a is assumed to be in generic ANNOVAR database format NOTICE: Variants matching filtering criteria are written to SN_pt123456.hg38_dbnsfp33a_dropped, other variants are written to SN_pt123456.hg38_dbnsfp33a_filtered NOTICE: Processing next batch with 80715 unique variants in 80715 input lines NOTICE: Database index loaded. Total number of bins is 552168 and the number of bins to be scanned is 41710 NOTICE: Scanning filter database /home/user1/git/annovar/humandb/hg38_dbnsfp33a.txt...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/InterVar/issues/26#issuecomment-369306888, or mute the thread https://github.com/notifications/unsubscribe-auth/AFptuDAE1sT1t2LAmANVO7YLNIxnIIJIks5tZYYOgaJpZM4SWqnh .