Closed dpatz closed 4 years ago
Interesting. I will test it tomorrow on my Mac.
I'm getting this too on Linux. I notice it starts with an error about the set changing size during iteration:
INFO:root:Crawling #44: https://up.codes/s/bond-beams
INFO:root:Crawling #45: https://up.codes/s/transformer-efficiency
Traceback (most recent call last):
File "main.py", line 60, in <module>
crawl.run()
File "/home/garrett/_Garrett/upcodes_general/external_repos/python-sitemap-2/crawler.py", line 132, in run
event_loop.run_until_complete(self.crawl_all_pending_urls(executor))
File "/usr/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
return future.result()
File "/home/garrett/_Garrett/upcodes_general/external_repos/python-sitemap-2/crawler.py", line 146, in crawl_all_pending_urls
for url in self.urls_to_crawl:
RuntimeError: Set changed size during iteration
ERROR:concurrent.futures:exception calling callback for <Future at 0x7fd202b34610 state=finished returned NoneType>
Traceback (most recent call last):
File "/usr/lib/python3.7/concurrent/futures/_base.py", line 324, in _invoke_callbacks
callback(self)
File "/usr/lib/python3.7/asyncio/futures.py", line 362, in _call_set_state
dest_loop.call_soon_threadsafe(_set_state, destination, source)
File "/usr/lib/python3.7/asyncio/base_events.py", line 728, in call_soon_threadsafe
self._check_closed()
File "/usr/lib/python3.7/asyncio/base_events.py", line 475, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
This makes me think it might be solved by merging #63.
Interesting. I can't reproduce it in linux. Do you experience it in every website or just on up.codes
?
Yeah, actually not surprised since I only experienced once in ~10 runs (on up.codes
and running with --num-workers 8
, didn't test other sites). I noted in #63 that the race condition was unlikely to occur, which fits with my rate of seeing it. My theory is that for some reason, there's a higher chance of having hit it in Mac.
@dpatz do you mind trying to rerun with the latest master
now that the race condition is fixed to see whether this was the same bug or a different one? Perhaps try with -n 2
and also -n 8
to be sure.
@Garrett-R Just tested both and it's working!
Nice to hear.
When I run with any number of workers greater than 1, I get the following error after crawling around 40 urls.
I'm on a Mac with Catalina. Seems to run fine on Linux.
Here command I'm using to repro:
python main.py --domain="https://up.codes" --output="sitemap.xml" -v -n 2