Closed bharat1912 closed 6 months ago
Hi!
Thanks for using cazy_webscraper
- sorry it's not working at the moment.
This issue is a duplicate of #120 and #125 - these are all related to parsing incomplete XML files from NCBI. This is typically the result of an interrupted connection to NCBI when downloading the XML.
I will close this issue, while reopening and continue work on #120.
This shouldn't take long to fix so please bear with!
Please complete this report in full and as much detail as possible. It will help with getting the bug fixed far sooner!
## To Reproduce Please include the specific steps (including all code) you performed, so that we can check if the behaviour can be reproduced: Install pre-req and activate: $mamba create -n cazomevolve python=3.9 $mamba activate cazomevolve
Install cazoevolve from github repository (with pip) $git clone https://github.com/HobnobMancer/cazomevolve.git $cd cazomevolve $python3 -m pip install cazomevolve/.
$cazomevolve --version 0.1.7.3
Install dbcan: $mamba install -c conda-forge dbcan
Download CAZy database with cazomevolve activated: (cazomevolve) bharat@bharat-Precision-Tower-7810:~$ cazy_webscraper -o /media/bharat/volume2/db/cazy_db/
Using default CAZy class synonyms
Built output directory: /media/bharat/volume2/db
Built new local CAZyme database at
/media/bharat/volume2/db/cazy_db
Built output directory: /media/bharat/volume2/db/.cazy_webscraper_2024-02-25_20-55-10
[WARNING] [cazy_webscraper.cazy_scraper]: Created cache dir: /media/bharat/volume2/db/.cazy_webscraper_2024-02-25_20-55-10
Downloading CAZy txt file: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40626574/40626574 [01:13<00:00, 553302.84it/s]
Parsing CAZy txt file: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4445596/4445596 [10:49<00:00, 6846.22it/s]
Searching for multiple taxa annotations: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3477359/3477359 [00:15<00:00, 229819.30it/s]
Batch retrieving tax info from NCBI. Batch size:200: 0%| | 0/267 [00:00<?, ?it/sGenBank accession AAB28815.1 retrieved from NCBI, but it is not present in CAZy | 0/199 [00:00<?, ?it/s]
GenBank accession AAA35470.1 retrieved from NCBI, but it is not present in CAZy
GenBank accession M83801.1 retrieved from NCBI, but it is not present in CAZy
GenBank accession AAB26309.1 retrieved from NCBI, but it is not present in CAZy
GenBank accession CAA78311.1 retrieved from NCBI, but it is not present in CAZy
██████████████████......................................................................................................
.....................................................................................................................................................
Retrieving organism from NCBI: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 195/195 [00:00<00:00, 21455.08it/s]
Batch retrieving tax info from NCBI. Batch size:200: 4%|████▋ | 11/267 [01:03<24:48, 5.82s/it]
Traceback (most recent call last):
File "/home/bharat/mambaforge/envs/cazomevolve/lib/python3.9/http/client.py", line 560, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
File "/home/bharat/mambaforge/envs/cazomevolve/lib/python3.9/http/client.py", line 527, in _read_next_chunk_size
return int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/bharat/mambaforge/envs/cazomevolve/lib/python3.9/http/client.py", line 592, in _readinto_chunked chunk_left = self._get_chunk_left() File "/home/bharat/mambaforge/envs/cazomevolve/lib/python3.9/http/client.py", line 562, in _get_chunk_left raise IncompleteRead(b'') http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/bharat/mambaforge/envs/cazomevolve/bin/cazy_webscraper", line 8, in
sys.exit(main())
File "/home/bharat/mambaforge/envs/cazomevolve/lib/python3.9/site-packages/cazy_webscraper/cazy_scraper.py", line 268, in main
get_cazy_data(
File "/home/bharat/mambaforge/envs/cazomevolve/lib/python3.9/site-packages/cazy_webscraper/cazy_scraper.py", line 378, in get_cazy_data
cazy_data, successful_replacement = replace_multiple_tax(
File "/home/bharat/mambaforge/envs/cazomevolve/lib/python3.9/site-packages/cazy_webscraper/ncbi/taxonomy/multiple_taxa.py", line 170, in replace_multiple_tax
cazy_data = get_ncbi_tax(epost_results, cazy_data, replaced_taxa_logger, args)
File "/home/bharat/mambaforge/envs/cazomevolve/lib/python3.9/site-packages/cazy_webscraper/ncbi/taxonomy/multiple_taxa.py", line 201, in get_ncbi_tax
protein_records = Entrez.read(record_handle, validate=False)
File "/home/bharat/mambaforge/envs/cazomevolve/lib/python3.9/site-packages/Bio/Entrez/init.py", line 503, in read
record = handler.read(handle)
File "/home/bharat/mambaforge/envs/cazomevolve/lib/python3.9/site-packages/Bio/Entrez/Parser.py", line 392, in read
self.parser.ParseFile(handle)
File "/home/bharat/mambaforge/envs/cazomevolve/lib/python3.9/http/client.py", line 463, in read
n = self.readinto(b)
File "/home/bharat/mambaforge/envs/cazomevolve/lib/python3.9/http/client.py", line 497, in readinto
return self._readinto_chunked(b)
File "/home/bharat/mambaforge/envs/cazomevolve/lib/python3.9/http/client.py", line 608, in _readinto_chunked
raise IncompleteRead(bytes(b[0:total_bytes]))
http.client.IncompleteRead: IncompleteRead(441 bytes read)
Describe the bug
CAzy datase fails to download after 4% download. Error, above
Expected behavior
Expected the database to be downloaded
Screenshots
Part of the download and the complete error reproduced above
Setup
Please provide a brief summary of your setup/computer you are using. For example:
Desktop (please complete the following information):
Smartphone (please complete the following information): Not used.
Additional context
Nil