WGLab / doc-ANNOVAR

Documentation for the ANNOVAR software
http://annovar.openbioinformatics.org
234 stars 359 forks source link

Annovar Download Database Error "unexpected end of file", "uncompress failed" #79

Closed roselucia closed 5 years ago

roselucia commented 5 years ago

Hi Kai,

I already used Annovar successfully on my MacBook Pro (13'', early 2011) with MacOS High Sierra 10.13.6. Now I wanted to use another computer with a better processor etc., so I used a MacBook Pro (13'', Mid 2014) with MacOS High Sierra 10.13.6. I used the same download link as I used earlier (02.10.2019) on my older MacBook Pro to download Annovar on the new MacBook. Interestingly the old link lead me to the download of the latest Annovar version (24.10.2019). When trying to download the following databases, I was left with an error for two of them (dbnsfp gnomad211_genome). Command (e.g. for dbnsfp35a): perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar dbnsfp35a humandb/ Databases: -refGene -ensGene -exac03 -gnomad211_genome -gnomad211_exome -1000g2015aug -avsnp150 -dbnsfp35a -clinvar_20190305 Error: "unexpected end of file", "uncompress failed"

Content of Terminal Valerios-MBP:~ valeriovirzi1$ cd /Users/valeriovirzi1/Desktop/annovar Valerios-MBP:annovar valeriovirzi1$ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar dbnsfp35a humandb/ NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done NOTICE: Downloading annotation database http://www.openbioinformatics.org/annovar/download/hg19_dbnsfp35a.txt.gz ... Done NOTICE: Downloading annotation database http://www.openbioinformatics.org/annovar/download/hg19_dbnsfp35a.txt.idx.gz ... Done NOTICE: Uncompressing downloaded files gunzip: hg19_dbnsfp35a.txt.gz: unexpected end of file gunzip: hg19_dbnsfp35a.txt.gz: uncompress failed NOTICE: Finished downloading annotation files for hg19 build version, with files saved at the 'humandb' directory Valerios-MBP:annovar valeriovirzi1$ perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar dbnsfp35a humandb/ NOTICE: Web-based checking to see whether ANNOVAR new version is available ... Done NOTICE: Downloading annotation database http://www.openbioinformatics.org/annovar/download/hg19_dbnsfp35a.txt.gz ... Done NOTICE: Downloading annotation database http://www.openbioinformatics.org/annovar/download/hg19_dbnsfp35a.txt.idx.gz ... Done NOTICE: Uncompressing downloaded files gunzip: hg19_dbnsfp35a.txt.gz: unexpected end of file gunzip: hg19_dbnsfp35a.txt.gz: uncompress failed NOTICE: Finished downloading annotation files for hg19 build version, with files saved at the 'humandb' directory

I would be glad for help! Thanks! Rose

roselucia commented 5 years ago

I checked two more things to find a solution / reason for this problem: 1) I unzipped the downloaded .txt.gz file with another application (StuffIt Expander Version 16.0.6), which produced a uncompressed file of about 1.2 GB. This seems to small for the dbnsfp35a. 2) In order check the download rate, I used the direct link to the database dbnsfp35a (http://www.openbioinformatics.org/annovar/download/hg19_dbnsfp35a.txt.idx.gz) in the safari as well as in the chrome browser. The browser estimated the file size to be 6.26 GB. The download rate was tremendously slow either way (approx. 20-30KB/s). Could this slow download rate also be true for downloading the database in the terminal (as described above)? And may this slow download rate be the cause of the error which occurred when using the command perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar dbnsfp35a humandb/ (see above)?

kaichop commented 5 years ago

You may want to use a different computer or network to download the file, typically it should be >10Mb/s in speed. "unexpected end of file" means file is not downloaded completely.

On Tue, Nov 5, 2019 at 3:59 PM roselucia notifications@github.com wrote:

I checked two more things to find a solution / reason for this problem:

  1. I unzipped the downloaded .txt.gz file with another application (StuffIt Expander Version 16.0.6), which produced a uncompressed file of about 1.2 GB. This seems to small for the dbnsfp35a.
  2. In order check the download rate, I used the direct link to the database dbnsfp35a ( http://www.openbioinformatics.org/annovar/download/hg19_dbnsfp35a.txt.idx.gz) in the safari as well as in the chrome browser. The browser estimated the file size to be 6.26 GB. The download rate was tremendously slow either way (approx. 20-30KB/s). Could this slow download rate also be true for downloading the database in the terminal (as described above)? And may this slow download rate be the cause of the error which occurred when using the command perl annotate_variation.pl -buildver hg19 -downdb -webfrom annovar dbnsfp35a humandb/ (see above)?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/79?email_source=notifications&email_token=ABNG3OG6IWBRH5RJJV3V3DDQSHNCBA5CNFSM4JJJBRBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDEJR2Y#issuecomment-550017259, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OHFCJGU7AGYZ63WWATQSHNCBANCNFSM4JJJBRBA .

roselucia commented 5 years ago

Hi Kai, thanks for the fast response. I have already tried another network. But the error was the same. In order to have a comparison, I have now checked how fast the download rate is when downloading the same file "database dbnsfp35a"(http://www.openbioinformatics.org/annovar/download/hg19_dbnsfp35a.txt.idx.gz) in safari on my old MacBook vs. the new MacBook (using the same network). Interestingly the download rate on my new MacBook is 500KB/s (not 10Mb/s, but still a lot faster). I will go ahead and try to use Annovar on my old MacBook again (-as it worked here before). Would you recommend using the new Annovar version instead of using the Annovar version I downloaded on 02.10.2019 (which I kept on an external Hard Drive) ? Thanks

kaichop commented 5 years ago

Try the new version, it has some additional features and removed bugs. I am collecting user feedback right now to release a new stable new version soon.

On Tue, Nov 5, 2019 at 5:38 PM roselucia notifications@github.com wrote:

Hi Kai, thanks for the fast response. I have already tried another network. But the error was the same. In order to have a comparison, I have now checked how fast the download rate is when downloading the same file "database dbnsfp35a"( http://www.openbioinformatics.org/annovar/download/hg19_dbnsfp35a.txt.idx.gz) in safari on my old MacBook vs. the new MacBook (using the same network). Interestingly the download rate on my new MacBook is 500KB/s (not 10Mb/s, but still a lot faster). I will go ahead an download Annovar on my old MacBook again (-as it worked here before). Would you recommend using the new Annovar version instead of using the Annovar version I downloaded on 02.10.2019? Thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WGLab/doc-ANNOVAR/issues/79?email_source=notifications&email_token=ABNG3OG7P6XODW3Q43H55DLQSHYVPA5CNFSM4JJJBRBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDES2EA#issuecomment-550055184, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNG3OB7SY6ATQROHINVAMLQSHYVPANCNFSM4JJJBRBA .

roselucia commented 5 years ago

Hi Kai,

thanks again for the fast response. I will definitely use the new one for all my new data and keep you updated.

However for the data for one current project I already used the old annovar. Would it be ok to download some additional databases with the old Annovar version and keep on using the old version for this kind of data in order to have a consistency? As this data has to be annotated by the end of the week I would rather not use the new version again for all of the data. (The only problem we encountered with the old version was the already reported problem that for "non_cancer_AF_popmax" was never included a genomic frequency, but only the exomic on (gnomad211_genome), as reported in my email on the 29/10/2019,)

What do you think? Thanks you so much for you fast help!

All the best, Rose

kaichop commented 5 years ago

This is okay, the changes are all minor, perhaps the biggest change is how indels are handled and how cdot notation for coding indels are calculated.

Sent from my iPhone

On Nov 6, 2019, at 2:16 AM, roselucia notifications@github.com wrote:

 Hi Kai,

thanks again for the fast response. I will definitely use the new one for all my new data and keep you updated.

However for the data for one current project I already used the old annovar. Would it be ok to download some additional databases with the old Annovar version and keep on using the old version for this kind of data in order to have a consistency? As this data has to be annotated by the end of the week I would rather not use the new version again for all of the data. (The only problem we encountered with the old version was the already reported problem that for "non_cancer_AF_popmax" was never included a genomic frequency, but only the exomic on (gnomad211_genome), as reported in my email on the 29/10/2019,)

What do you think? Thanks you so much for you fast help!

All the best, Rose

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

roselucia commented 5 years ago

Hi Kai,

thanks for the fast response. In case I would have time to run all of the data with the new Annovar version till the end of the week, would you say the new version is already stable enough to use? You said earlier that you are thinking of releasing a new stable version of the new Annvoar version soon. As I am short on time with the annotation of this data (and downloads of the databases need their time) I would like to just use the new Annovar, only if you think it is already stable enough to be used for my final annotation. What do you think? Thanks for your great help Kai! All the best, Rose

kaichop commented 5 years ago

Not completely final but certainly okay to use (any additional changes will be minor)

Sent from my iPhone

On Nov 6, 2019, at 8:43 AM, roselucia notifications@github.com wrote:

 Hi Kai,

thanks for the fast response. In case I would have time to run all of the data with the new Annovar version till the end of the week, would you say the new version is already stable enough to use? You said earlier that you are thinking of releasing a new stable version of the new Annvoar version soon. As I am short on time with the annotation of this data (and downloads of the databases need their time) I would like to just use the new Annovar, if you think it is already stable enough to be used for my final annotation. What do you think? Thanks for your great help Kai! All the best, Rose

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

roselucia commented 5 years ago

HI Kai, thanks again for the fast response. Its good to know that both options (old Annovar version and current release of the new Annovar version) are alright for me to use for final annotation of the data of my research project! Thanks a lot!

Did you by chance already had time to look at the problem I reported for "non_cancer_AF_popmax", when using the old Annovar Version, as reported in my email on the 29/10/2019? ("non_cancer_AF_popmax" never included a genomic frequency, but only the exomic on (gnomad211_genome))

Thanks again! Rose

kaichop commented 5 years ago

I will check email