Closed naurasd closed 3 months ago
may have been fixed with the most recent release?
May have yes. Please retry this file. I'm again working on a major update because BOLD just limits access to almost everything at the moment. But have to do some intense testing before. At least API access has been limited to 3 requests per minute, which is insane :( However, I'm actively working on it, so I hope this will be fixed soon. This effects all BOLDigger version unfortunately, not just 2.
alright no prob. will try again with new release 1.3.4 and update you.
Fixed with 2.0.0.
Hi Dominik,
Might need to be reopened. Not sure if this is fixed with 2.0.4.
The end of my .err file look like this:
Generating download links: 99%|█████████▉| 1482/1500 [24:53:25<05:23, 17.97s/it
Generating download links: 99%|█████████▉| 1482/1500 [24:55:18<05:23, 17.97s/it
Downloading data: 99%|█████████▉| 1482/1500 [24:55:18<05:23, 17.97s/it]
Generating download links: 99%|█████████▉| 1492/1500 [24:55:47<02:14, 16.83s/it
Generating download links: 99%|█████████▉| 1492/1500 [24:57:57<02:14, 16.83s/it
Downloading data: 99%|█████████▉| 1492/1500 [24:57:57<02:14, 16.83s/it]
Generating download links: : 1502it [24:58:23, 59.86s/it]
Downloading additional data: 1%| | 2/177 [15:20:37<58:27, 20.04s/it][
The end of my .out file looks like this:
22:27:29: Downloaded top 100 hits of all records for ASV6501
22:27:31: Downloaded top 100 hits of all records for ASV6482
22:27:33: Downloaded top 100 hits of all records for ASV6495
22:27:35: Downloaded top 100 hits of all records for ASV6493
22:27:37: Downloaded top 100 hits of all records for ASV6485
22:27:39: Downloaded top 100 hits of all records for ASV6492
22:27:41: Downloaded top 100 hits of all records for ASV6500
22:27:41: All records top 100 records successfully downloaded.
22:27:41: Ordering top 100 hits.
22:27:45: Generating download links for additional data.
So the last action happened last night 22:27. This was the last time the .h5.lz file was appended. Since then (so for the past 15 hours) nothing has happened. It's difficult for me to understand the time stamps in the .err file. The last time stamp of 15:20:37
is basically the time used since the start of the generation of download links for additional data last night at 22:27. Also not sure what the 2/177 refers to.
Let me know if you need any of the other files to check up on this!
Best nauras
Hi Dominik,
just adding some comments as an update after our exchange earlier today:
Best Nauras
Update for running on personal computer:
Time out error, but this should be an issue from BOLD's side.
20:14:13: Trying to log in.
20:14:15: Login successful.
20:14:20: Starting to download from the species level database.
20:14:20: Starting to download from the all records database.
20:14:20: Performing second login for requesting links from the all records database.
20:14:20: Trying to log in.
20:14:22: Login successful.
20:14:24: Starting to gather download links from the all records database.
20:14:25: All records top 100 records successfully downloaded.
20:14:25: Ordering top 100 hits.
20:14:27: Generating download links for additional data.
Downloading additional data: 25%|███████████▋ | 199/803 [1:06:38<3:22:14, 20.09s/it]
Traceback (most recent call last):
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\urllib3\response.py", line 444, in _error_catcher
yield
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\urllib3\response.py", line 831, in read_chunked
chunk = self._handle_chunk(amt)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\urllib3\response.py", line 784, in _handle_chunk
returned_chunk = self._fp._safe_read(self.chunk_left)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\http\client.py", line 640, in _safe_read
data = self.fp.read(amt)
^^^^^^^^^^^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\socket.py", line 720, in readinto
return self._sock.recv_into(b)
^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\requests\models.py", line 820, in generate
yield from self.raw.stream(chunk_size, decode_content=True)
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\urllib3\response.py", line 624, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\urllib3\response.py", line 816, in read_chunked
with self._error_catcher():
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\contextlib.py", line 158, in __exit__
self.gen.throw(value)
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\urllib3\response.py", line 449, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='www.boldsystems.org', port=80): Read timed out.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Scripts\boldigger2.exe\__main__.py", line 7, in <module>
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\boldigger2\__main__.py", line 88, in main
id_engine_coi.main(
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\boldigger2\id_engine_coi.py", line 673, in main
additional_data_download.main(fasta_path, hdf_name_top_100_hits, read_fasta)
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\boldigger2\additional_data_download.py", line 342, in main
additional_data = asyncio.run(
^^^^^^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\boldigger2\additional_data_download.py", line 225, in as_session
return await tqdm_asyncio.gather(*tasks, desc="Downloading additional data")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\tqdm\asyncio.py", line 79, in gather
res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout,
^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\asyncio\tasks.py", line 631, in _wait_for_one
return f.result() # May raise f.exception().
^^^^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\tqdm\asyncio.py", line 76, in wrap_awaitable
return i, await f
^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\boldigger2\additional_data_download.py", line 201, in limit_concurrency
return await as_request(url, as_session)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\boldigger2\additional_data_download.py", line 178, in as_request
response = await as_session.get(url, timeout=60)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\requests\sessions.py", line 746, in send
r.content
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\requests\models.py", line 902, in content
self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Nauras\AppData\Local\Programs\Python\Python312\Lib\site-packages\requests\models.py", line 826, in generate
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.boldsystems.org', port=80): Read timed out.
I'll write a fix for that, but it may take until September. Have to finish updating my metabarcoding pipeline first.
Fixed with 2.1.0
Sorry, still occurs with 2.1.0..
.err file:
Downloading additional data: 42%|████▏ | 1927/4539 [1:57:35<2:39:24, 3.66s/it]
Traceback (most recent call last):
File "/home/naurasd/.local/lib/python3.12/site-packages/urllib3/response.py", line 444, in _error_catcher
yield
File "/home/naurasd/.local/lib/python3.12/site-packages/urllib3/response.py", line 828, in read_chunked
self._update_chunk_length()
File "/home/naurasd/.local/lib/python3.12/site-packages/urllib3/response.py", line 758, in _update_chunk_length
line = self._fp.fp.readline()
^^^^^^^^^^^^^^^^^^^^^^
File "/sw/comp/python/3.12.1/rackham/lib/python3.12/socket.py", line 707, in readinto
return self._sock.recv_into(b)
^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/naurasd/.local/lib/python3.12/site-packages/requests/models.py", line 820, in generate
yield from self.raw.stream(chunk_size, decode_content=True)
File "/home/naurasd/.local/lib/python3.12/site-packages/urllib3/response.py", line 624, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "/home/naurasd/.local/lib/python3.12/site-packages/urllib3/response.py", line 816, in read_chunked
with self._error_catcher():
File "/sw/comp/python/3.12.1/rackham/lib/python3.12/contextlib.py", line 158, in __exit__
self.gen.throw(value)
File "/home/naurasd/.local/lib/python3.12/site-packages/urllib3/response.py", line 449, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='8.219.97.248', port=80): Read timed out.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/naurasd/.local/bin/boldigger2", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/naurasd/.local/lib/python3.12/site-packages/boldigger2/__main__.py", line 88, in main
id_engine_coi.main(
File "/home/naurasd/.local/lib/python3.12/site-packages/boldigger2/id_engine_coi.py", line 673, in main
additional_data_download.main(fasta_path, hdf_name_top_100_hits, read_fasta)
File "/home/naurasd/.local/lib/python3.12/site-packages/boldigger2/additional_data_download.py", line 431, in main
download_data(process_ids_to_download, hdf_name_top_100_hits)
File "/home/naurasd/.local/lib/python3.12/site-packages/boldigger2/additional_data_download.py", line 285, in download_data
response = session.get(
^^^^^^^^^^^^
File "/home/naurasd/.local/lib/python3.12/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/naurasd/.local/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/naurasd/.local/lib/python3.12/site-packages/requests/sessions.py", line 746, in send
r.content
File "/home/naurasd/.local/lib/python3.12/site-packages/requests/models.py", line 902, in content
self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/naurasd/.local/lib/python3.12/site-packages/requests/models.py", line 826, in generate
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='8.219.97.248', port=80): Read timed out.
End of .out file:
12:23:05: API overloaded. Switching proxy.
12:23:43: Proxy set to http://35.185.196.38:3128.
12:23:44: API overloaded. Switching proxy.
12:23:49: Proxy set to http://35.185.196.38:3128.
12:23:49: API overloaded. Switching proxy.
12:24:28: Proxy set to http://69.197.135.43:18080.
12:24:28: API overloaded. Switching proxy.
12:25:08: Proxy set to http://8.219.97.248:80.
The ConnectionError should be handled... I'll look into this again, maybe it's also the TimeOut.
Updated to 2.1.3, Not the ReadTimeout is also handled correctly. Please note that you can now restart the additional data download and it will continue where it left off.
yes i saw that. so cool that this is possible now!
thanks so much!
Hi Dominik,
having the issue currently that no xlsx file is being written for my results.
Using boldigger2 v1.0.6 with python 3.12.1.
I have a fasta file with 1,773 COI sequences. After roughly 4.5 hours it says download links for additional data are being generated. Then nothing happens for the next 5 days until my job is being terminated due to time out.
All files attached as txt files:
Fasta file:
COI_cluster_reps_lulu_curated.txt
h5.lz files:
COI_cluster_reps_lulu_curated_top_100_hits.h5.txt COI_cluster_reps_lulu_curated_download_links.h5.txt
Job error and output files:
digger_tilde_err.txt digger_tilde_out.txt
Problem from BOLD's side?
Best,
Nauras