ArchiveTeam / grab-site

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Other
1.31k stars 129 forks source link

Any solutions for already mentioned errors: Event loop is closed / Task is destroyed? #165

Open weselow opened 4 years ago

weselow commented 4 years ago

Hi, I am trying this git, so if maybe the solution was found or fixed? I checked logs and see it arised rather often....

Traceback (most recent call last): File "/home/viking01/gs-venv/lib/python3.7/site-packages/libgrabsite/dashboard_client.py", line 54, in sender await asyncio.sleep(delay) File "/home/viking01/.pyenv/versions/3.7.5/lib/python3.7/asyncio/tasks.py", line 593, in sleep future, result) File "/home/viking01/.pyenv/versions/3.7.5/lib/python3.7/asyncio/base_events.py", line 652, in call_later context=context) File "/home/viking01/.pyenv/versions/3.7.5/lib/python3.7/asyncio/base_events.py", line 662, in call_at self._check_closed() File "/home/viking01/.pyenv/versions/3.7.5/lib/python3.7/asyncio/base_events.py", line 475, in _check_closed raise RuntimeError('Event loop is closed') RuntimeError: Event loop is closed

The second one: Task was destroyed but it is pending! task: <Task pending coro=<WebSocketCommonProtocol.transfer_data() running at /home/viking01/gs-venv/lib/python3.7/site-packages/websockets/protocol.py:827> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7efa8267c990>()]> cb=[<TaskWakeupMethWrapper object at 0x7efa82eca910>()]>

Thanks!

P.S. Hmm, what is the best way to re-run such pending tasks? Full trace attached aws-law.ru_errors.log blueheronbio.com_errors.log

ivan commented 4 years ago

grab-site often emits those errors you listed when a crawl is finished, but they should not affect the crawl and can be ignored. Sorry about the confusing output.

weselow commented 4 years ago

Thanks for the quick answer! :) Also supose it causes another issue: grabsite in default runs wpull process with argument to delete temp_dir after all. if temp dir is set not on default way, by option grab-site ... --dir=$TempDir - this temp_dir is not deleted