WGLab / doc-ANNOVAR

Documentation for the ANNOVAR software
http://annovar.openbioinformatics.org
228 stars 349 forks source link

Dbnsfp33 running time #243

Open tahirasma opened 5 months ago

tahirasma commented 5 months ago

Greetings,

I am trying to run the VCF file of 38,000 variants. The rest of the databases run effiicently but it's taking almost 20-30 minutes in dbnsfp (although the index is generated) annotation. Please advice if there is a way to make it faster. The command is as follows:

annotate_variation.pl -filter -dbtype dbnsfp33 -buildver hg38 -outfile ./output_dbnsfp

Regards, Asma Tahir

kaichop commented 5 months ago

In the LOG (the "NOTICE" message printed on screen, what does it say? Also, try to use table_annovar and dbnsfp42a

On Wed, Apr 24, 2024 at 6:40 AM tahirasma @.***> wrote:

Greetings,

I am trying to run the VCF file of 38,000 variants. The rest of the databases run effiicently but it's taking almost 20-30 minutes in dbnsfp (although the index is generated) annotation. Please advice if there is a way to make it faster. The command is as follows:

annotate_variation.pl -filter -dbtype dbnsfp33 -buildver hg38 -outfile ./output_dbnsfp

Regards, Asma Tahir

— Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/243, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OEQG6IWVFCDYABI4JDY66D3TAVCNFSM6AAAAABGWU7ZPGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3DAOJYGEZDSMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

tahirasma commented 5 months ago

The output was as follows:

`NOTICE: Processing operation=f protocol=dbnsfp33 NOTICE: Finished reading 66 column headers for '-dbtype dbnsfp33'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype dbnsfp33 -buildver hg38 -outfile .input_file.filtered_filtered_dbnsfp .input_file.filtered_filtered_dbnsfp.avinput ./dbs/ -otherinfo> NOTICE: Output file with variants matching filtering criteria is written to .input_file.filtered_filtered_dbnsfp.hg38_dbnsfp33_dropped, and output file with other variants is written to .input_file.filtered_filtered_dbnsfp.hg38_dbnsfp33_filtered NOTICE: Processing next batch with 38289 unique variants in 38289 input lines NOTICE: Database index loaded. Total number of bins is 101730 and the number of bins to be scanned is 21655 NOTICE: Scanning filter database ./dbs/hg38_dbnsfp33.txt...Done`

kaichop commented 4 months ago

There is no error in the message below, so I think it is just the way how your VCF file is generated. Basically >20% of the genome bins are scanned, so indexing provides limited help to speed up. Also, dbnsfp47a and dbnsfp47c are updated now on hg19/hg38 coordinate.

On Thu, Apr 25, 2024 at 2:40 AM tahirasma @.***> wrote:

The output was as follows:

NOTICE: Processing operation=f protocol=dbnsfp33 NOTICE: Finished reading 66 column headers for '-dbtype dbnsfp33'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype dbnsfp33 -buildver hg38 -outfile .input_file.filtered_filtered_dbnsfp .input_file.filtered_filtered_dbnsfp.avinput ./dbs/ -otherinfo> NOTICE: Output file with variants matching filtering criteria is written to .input_file.filtered_filtered_dbnsfp.hg38_dbnsfp33_dropped, and output file with other variants is written to .input_file.filtered_filtered_dbnsfp.hg38_dbnsfp33_filtered NOTICE: Processing next batch with 38289 unique variants in 38289 input lines NOTICE: Database index loaded. Total number of bins is 101730 and the number of bins to be scanned is 21655 NOTICE: Scanning filter database ./dbs/hg38_dbnsfp33.txt...Done

— Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/243#issuecomment-2076475447, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OA3S3UFOHYZW7PZZFDY7CQMHAVCNFSM6AAAAABGWU7ZPGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZWGQ3TKNBUG4 . You are receiving this because you commented.Message ID: @.***>