Closed naurasd closed 2 months ago
Boldigger2 trys to fetch the top 100 table from the HTML that is returned by bold. Sometimes there is (so far I don't get the reason) no table, even though parsable HTML is returned. I will look into this, would love to know for which sequence this happens, however since they are random it is kind of hard to find out. I will have to write a temporary fix that just writes the otu und HTML into a file to find out what is going on there. I think restarting solve the issue, making it even stranger, since it does not seem to be a problem on bolds end nor with the code, but just random behavior.... Can you send me your download links file, so I have a starting point?
naurasd @.***> schrieb am Do., 4. Juli 2024, 00:44:
Hi Dominik,
having a bit of trouble classifying ASVs in the following file: COI_cluster_reps_lulu_curated.txt https://github.com/user-attachments/files/16091371/COI_cluster_reps_lulu_curated.txt
Boldigger2 is running fine for a while until this error happens (shown are the last lines of the error output file, including the entire error part):
Downloading top 100 hits: 77%|███████▋ | 1321/1723 [5:10:45<3:08:34, 37.16s/it] Down Downloading top 100 hits %|███████▋ | 1326/1723 [5:11:37<1:27:54, 13.29s/it] Downloading top 100 hits: 77%|███████▋ | 1327/1723 [5:12:01<1:13:22, 11.12s/it] Downloading top 100 hits: 77%|███████▋ | 1329/1723 [5:12:21<1:32:36, 14.10s/it] Traceback (most recent call last): File "/home/naurasd/.local/bin/boldigger2", line 8, in
sys.exit(main()) ^^^^^^ File "/home/naurasd/.local/lib/python3.12/site-packages/boldigger2/main.py", line 87, in main id_engine_coi.main( File "/home/naurasd/.local/lib/python3.12/site-packages/boldigger2/id_engine_coi.py", line 544, in main asyncio.run( File "/sw/comp/python3/3.12.1/rackham/lib/python3.12/asyncio/runners.py", line 194, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/sw/comp/python3/3.12.1/rackham/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/sw/comp/python3/3.12.1/rackham/lib/python3.12/asyncio/base_events.py", line 684, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/home/naurasd/.local/lib/python3.12/site-packages/boldigger2/id_engine_coi.py", line 385, in as_session return await tqdm_asyncio.gather(*tasks, desc="Downloading top 100 hits") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/naurasd/.local/lib/python3.12/site-packages/tqdm/asyncio.py", line 79, in gather res = [await f for f in cls.as_completed(ifs, loop=loop, timeout=timeout, ^^^^^^^ File "/sw/comp/python3/3.12.1/rackham/lib/python3.12/asyncio/tasks.py", line 631, in _wait_for_one return f.result() # May raise f.exception(). ^^^^^^^^^^ File "/home/naurasd/.local/lib/python3.12/site-packages/tqdm/asyncio.py", line 76, in wrap_awaitable return i, await f ^^^^^^^ File "/home/naurasd/.local/lib/python3.12/site-packages/boldigger2/id_engine_coi.py", line 351, in limit_concurrency return await as_request( ^^^^^^^^^^^^^^^^^ File "/home/naurasd/.local/lib/python3.12/site-packages/boldigger2/id_engine_coi.py", line 226, in as_request response_table = pd.read_html( ^^^^^^^^^^^^^ File "/home/naurasd/.local/lib/python3.12/site-packages/pandas/io/html.py", line 1240, in read_html return _parse( ^^^^^^^ File "/home/naurasd/.local/lib/python3.12/site-packages/pandas/io/html.py", line 1003, in _parse raise retained File "/home/naurasd/.local/lib/python3.12/site-packages/pandas/io/html.py", line 983, in _parse tables = p.parse_tables() ^^^^^^^^^^^^^^^^ File "/home/naurasd/.local/lib/python3.12/site-packages/pandas/io/html.py", line 249, in parse_tables tables = self._parse_tables(self._build_doc(), self.match, self.attrs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/naurasd/.local/lib/python3.12/site-packages/pandas/io/html.py", line 598, in _parse_tables raise ValueError("No tables found") ValueError: No tables found Here are the last 4 line sof the output file:
07:22:38: Downloaded top 100 species level records for ASV3538 07:22:44: Downloaded top 100 species level records for ASV1775 07:23:01: Downloaded top 100 species level records for ASV1232 07:23:21: Downloaded top 100 species level records for ASV1099
I really have no clue what is going on here. Any help appreciated.
Thanks a salways for your hard work!
Nauras
— Reply to this email directly, view it on GitHub https://github.com/DominikBuchner/BOLDigger2/issues/15, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJH6ILAFBJ2RFDRS4DMRZHDZKR5DDAVCNFSM6AAAAABKKNRUXWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4DSNRRGE3DMMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
To give a more concise answer, since I'm fully awake now: This is a tough nut to crack so this will take a bit more time for me. It appears that there are different reasons for this no tables found error so I have a few options:
So: Both "quick and dirty" fixes are too dangerous to just be applied. I'll have to go search for the actual cause, to tailor a solution that just captures this exception and nothing else. Since the error seems to appear at random this naturally takes some time. But I'll get there with the help of the users :)
TLDR: For your data, simply restarting may very well fix the issue, while I'm looking for a solution.
thanks for the reply in the early morning hours ;-)
have sent you the download links files via email.
Nauras
Fixed with 2.0.0
Unfortunately, this is not fixed with 2.0.1 (and python/3.12.1).
All files attached as txt files for you to reproduce the issue:
Fasta file:
COI_cluster_reps_lulu_curated.txt
h5.lz file:
sent to you via wetransfer, file size too large
Job error and output files:
So it's still the additional data download that simply fails?
naurasd @.***> schrieb am Mi., 31. Juli 2024, 17:51:
Unfortunately, this is not fixed with 2.0.1 (and python/3.12.1).
All files attached as txt files for you to reproduce the issue:
Fasta file:
COI_cluster_reps_lulu_curated.txt https://github.com/user-attachments/files/16444233/COI_cluster_reps_lulu_curated.txt
h5.lz file:
sent to you via wetransfer, file size too large
Job error and output files:
digger_error.txt https://github.com/user-attachments/files/16444245/digger_error.txt digger_output.txt https://github.com/user-attachments/files/16444249/digger_output.txt
— Reply to this email directly, view it on GitHub https://github.com/DominikBuchner/BOLDigger2/issues/15#issuecomment-2260843080, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJH6ILBVNVRE4FZYER3V5NDZPEBZNAVCNFSM6AAAAABKKNRUXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRQHA2DGMBYGA . You are receiving this because you modified the open/close state.Message ID: @.***>
Fixed with 2.0.3
amazing, thanks!
Hi Dominik,
having a bit of trouble classifying ASVs in the following file: COI_cluster_reps_lulu_curated.txt
Boldigger2 is running fine for a while until this error happens (shown are the last lines of the error output file, including the entire error part):
Here are the last 4 line sof the output file:
I really have no clue what is going on here. Any help appreciated.
Thanks a salways for your hard work!
Nauras