c4software / python-sitemap

Mini website crawler to make sitemap from a website.
GNU General Public License v3.0
366 stars 110 forks source link

RuntimeError: Event loop is closed - with > 1 workers #66

Closed dpatz closed 4 years ago

dpatz commented 4 years ago

When I run with any number of workers greater than 1, I get the following error after crawling around 40 urls.

INFO:root:Crawling #56: https://up.codes/s/natural-ventilation
ERROR:concurrent.futures:exception calling callback for <Future at 0x10ddc1190 state=finished returned NoneType>
Traceback (most recent call last):
  File "/Users/danpatz/.pyenv/versions/3.7.4/lib/python3.7/concurrent/futures/_base.py", line 324, in _invoke_callbacks
    callback(self)
  File "/Users/danpatz/.pyenv/versions/3.7.4/lib/python3.7/asyncio/futures.py", line 362, in _call_set_state
    dest_loop.call_soon_threadsafe(_set_state, destination, source)
  File "/Users/danpatz/.pyenv/versions/3.7.4/lib/python3.7/asyncio/base_events.py", line 728, in call_soon_threadsafe
    self._check_closed()
  File "/Users/danpatz/.pyenv/versions/3.7.4/lib/python3.7/asyncio/base_events.py", line 475, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

I'm on a Mac with Catalina. Seems to run fine on Linux.

Here command I'm using to repro:

python main.py --domain="https://up.codes" --output="sitemap.xml" -v -n 2

c4software commented 4 years ago

Interesting. I will test it tomorrow on my Mac.

Garrett-R commented 4 years ago

I'm getting this too on Linux. I notice it starts with an error about the set changing size during iteration:

INFO:root:Crawling #44: https://up.codes/s/bond-beams
INFO:root:Crawling #45: https://up.codes/s/transformer-efficiency
Traceback (most recent call last):
  File "main.py", line 60, in <module>
    crawl.run()
  File "/home/garrett/_Garrett/upcodes_general/external_repos/python-sitemap-2/crawler.py", line 132, in run
    event_loop.run_until_complete(self.crawl_all_pending_urls(executor))
  File "/usr/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
    return future.result()
  File "/home/garrett/_Garrett/upcodes_general/external_repos/python-sitemap-2/crawler.py", line 146, in crawl_all_pending_urls
    for url in self.urls_to_crawl:
RuntimeError: Set changed size during iteration
ERROR:concurrent.futures:exception calling callback for <Future at 0x7fd202b34610 state=finished returned NoneType>
Traceback (most recent call last):
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 324, in _invoke_callbacks
    callback(self)
  File "/usr/lib/python3.7/asyncio/futures.py", line 362, in _call_set_state
    dest_loop.call_soon_threadsafe(_set_state, destination, source)
  File "/usr/lib/python3.7/asyncio/base_events.py", line 728, in call_soon_threadsafe
    self._check_closed()
  File "/usr/lib/python3.7/asyncio/base_events.py", line 475, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

This makes me think it might be solved by merging #63.

c4software commented 4 years ago

Interesting. I can't reproduce it in linux. Do you experience it in every website or just on up.codes ?

Garrett-R commented 4 years ago

Yeah, actually not surprised since I only experienced once in ~10 runs (on up.codes and running with --num-workers 8, didn't test other sites). I noted in #63 that the race condition was unlikely to occur, which fits with my rate of seeing it. My theory is that for some reason, there's a higher chance of having hit it in Mac.

@dpatz do you mind trying to rerun with the latest master now that the race condition is fixed to see whether this was the same bug or a different one? Perhaps try with -n 2 and also -n 8 to be sure.

dpatz commented 4 years ago

@Garrett-R Just tested both and it's working!

c4software commented 4 years ago

Nice to hear.