Open thkuo opened 2 years ago
This kind of errors happen because the original software doesn't take into account that sometimes the database files can get corrupted while downloading if the download stops halfway
I made minor edits on some files to avoid this issue, I think they might help you https://github.com/cruizperez/MicrobeAnnotator/pull/38 (you would have to download from my fork here)
Thank you for the suggestion. However, I tried your fork and it couldn't really work with my environment:
(microbeannotator) thkuo@titan-compute-01:/net/sgi/metagenomics/thkuo/bin/test_microbeannotator$ ~/bin/MicrobeAnnotator.beta/bin/microbeannotator_db_builder --step 9 -t 12 -m diamond --bin_path /net/sgi/metagenomics/thkuo/bin/lib/diamond/ -d /net/sgi/metagenomics/thkuo/MicrobeAnnotator_DB/ --no_aspera
2022-04-04 14:21:54,197 [INFO]: This is MicrobeAnnotator v2.0.5
2022-04-04 14:21:54,197 [INFO]: I will download and format the databases I use.
2022-04-04 14:21:54,197 [INFO]: Creating database folders
2022-04-04 14:21:54,198 [INFO]: Step 9
2022-04-04 14:21:54,198 [INFO]: Downloading protein fasta files using wget.
100% [........................................................................] 20057842 / 200578422022-04-04 14:58:01,655 [INFO]: Merging protein files
Traceback (most recent call last):
File "/home/thkuo/bin/MicrobeAnnotator.beta/bin/microbeannotator_db_builder", line 459, in <module>
main()
File "/home/thkuo/bin/MicrobeAnnotator.beta/bin/microbeannotator_db_builder", line 451, in main
single_step, aspera, keep_temp, excludetrembl, bin_path)
File "/home/thkuo/bin/MicrobeAnnotator.beta/bin/microbeannotator_db_builder", line 184, in database_builder
database_directory, threads)
File "/home/thkuo/miniconda3/envs/microbeannotator/lib/python3.7/site-packages/microbeannotator/database/refseq_data_downloader.py", line 162, in refseq_fasta_downloader_wget
copyfileobj(temp_file,merged_db)
File "/home/thkuo/miniconda3/envs/microbeannotator/lib/python3.7/shutil.py", line 79, in copyfileobj
buf = fsrc.read(length)
File "/home/thkuo/miniconda3/envs/microbeannotator/lib/python3.7/gzip.py", line 300, in read1
return self._buffer.read1(size)
File "/home/thkuo/miniconda3/envs/microbeannotator/lib/python3.7/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/home/thkuo/miniconda3/envs/microbeannotator/lib/python3.7/gzip.py", line 482, in read
uncompress = self._decompressor.decompress(buf, size)
zlib.error: Error -3 while decompressing data: invalid block type
In case you want to check the version, below shows the information:
(microbeannotator) thkuo@titan-compute-01:~/bin/MicrobeAnnotator.beta$ git remote show origin
* remote origin
Fetch URL: https://github.com/silvtal/MicrobeAnnotator.git
Push URL: https://github.com/silvtal/MicrobeAnnotator.git
HEAD branch: master
Remote branches:
add-license-1 tracked
development tracked
master tracked
Local branch configured for 'git pull':
master merges with remote master
Local ref configured for 'git push':
master pushes to master (up to date)
* master
* 9b3620b silvtal, Wed Dec 22 13:00:25 2021 +0100: db_builder: re-download corrupted genbank downloads at step 10, merge steps 10 and 11, fix db_builder sqlite step; microbeannotator: fix --method_bin option
* c29275c silvtal, Tue Dec 14 16:47:48 2021 +0100: added corrupted RefSeq file correcting step
* d7220c1 silvtal, Thu Dec 9 20:42:07 2021 +0100: added --excludetrembl option to db builder
Hi guys, I have the same issue, did you find a solution?
Hello, It has something to do with NCBI's FTP. The workaround to fix this issue is to change 'ftp://' to 'https://' in lines 144 and 241 in _.../database/refseq_datadownloader.py
EDIT: Also, clean the 'temp_refseq_proteins' folder from previously downloaded files.
Dear MicrobeAnnotator team,
I tried to fix the databse with this command:
However, it failed as below:
When I tried it one more time, the error message became:
It looks like some problems in the compression procedure. What could be the cause?