HENNGE / arsenic

Async WebDriver implementation for asyncio and asyncio-compatible frameworks
Other
349 stars 52 forks source link

Graceful shutdown of interrupted loop #121

Open controversial opened 3 years ago

controversial commented 3 years ago

What's the proper way to clean up and close a browser session after the event loop is interrupted?

The following program loads example.com using Arsenic every two seconds forever. However, when the user interrupts the program with a KeyboardInterrupt, the browser session can't successfully be closed.

import arsenic
from arsenic.browsers import Chrome
from arsenic.services import Chromedriver
import os

import asyncio

async def main():
    service = Chromedriver(log_file=os.devnull)
    browser = Chrome()
    driver = await arsenic.start_session(service, browser)

    try:
        while True:
            await driver.get("https://example.com")
            await asyncio.sleep(2)
    except asyncio.CancelledError:
        await arsenic.stop_session(driver)

try:
    asyncio.run(main())
except KeyboardInterrupt:
    print("exited gracefully")

Stopping this program yields the following error:

unhandled exception during asyncio.run() shutdown
task: <Task finished name='Task-1' coro=<main() done, defined at /Users/luke/Developer/betting/src/mcve.py:8> exception=ClientOSError(54, 'Connection reset by peer')>
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/python@3.9/3.9.1_7/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/homebrew/Cellar/python@3.9/3.9.1_7/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 629, in run_until_complete
    self.run_forever()
  File "/opt/homebrew/Cellar/python@3.9/3.9.1_7/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 596, in run_forever
    self._run_once()
  File "/opt/homebrew/Cellar/python@3.9/3.9.1_7/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 1854, in _run_once
    event_list = self._selector.select(timeout)
  File "/opt/homebrew/Cellar/python@3.9/3.9.1_7/Frameworks/Python.framework/Versions/3.9/lib/python3.9/selectors.py", line 562, in select
    kev_list = self._selector.control(None, max_ev, timeout)
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/luke/Developer/betting/src/mcve.py", line 16, in main
    await asyncio.sleep(2)
  File "/opt/homebrew/Cellar/python@3.9/3.9.1_7/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/tasks.py", line 651, in sleep
    return await future
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/luke/Developer/betting/src/mcve.py", line 18, in main
    await arsenic.stop_session(driver)
  File "/Users/luke/Developer/betting/env/lib/python3.9/site-packages/arsenic/connection.py", line 95, in request
    async with self.session.request(
  File "/Users/luke/Developer/betting/env/lib/python3.9/site-packages/aiohttp/client.py", line 1117, in __aenter__
    self._resp = await self._coro
  File "/Users/luke/Developer/betting/env/lib/python3.9/site-packages/aiohttp/client.py", line 544, in _request
    await resp.start(conn)
  File "/Users/luke/Developer/betting/env/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 890, in start
    message, payload = await self._protocol.read()  # type: ignore
  File "/Users/luke/Developer/betting/env/lib/python3.9/site-packages/aiohttp/streams.py", line 604, in read
    await self._waiter
aiohttp.client_exceptions.ClientOSError: [Errno 54] Connection reset by peer

and then prints a warning

Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x103c1f2b0>

I can't use the get_session context manager because of the API design of my program. How can I make sure I clean up and close the browser session when the event loop is cancelled?

dimaqq commented 3 years ago

Interesting, stop_session tries to close the session nicely via the WebDriver API, which fails.

dimaqq commented 3 years ago

It seems that ^C is kinda special: KeyboardInterrupt is somehow "applied" both to await asyncio.sleep(...) and the outer asyncio.run().

My guess is that asyncio.run() receives the KeyboardInterrupt and then it cancels its tasks, which includes the "main" task, which is at that point in sleep() is really an anonymous future with a delayed callback. Which explains why the await asyncio.sleep() appears to raise CanceledError and not KeyboardInterrupt.

This, I think, is better discussed at async-sig@python.org

Here's what it takes to get this work:

1️⃣ patch arsenic/__init__.py like so:

async def stop_session(session: Session):
    try:
        await session.close()
    except BaseException:
        pass
    await session.driver.close()

2️⃣ update the MCVE like this:

import arsenic
from arsenic.browsers import Chrome
from arsenic.services import Chromedriver
import os

import asyncio

async def main():
    service = Chromedriver(log_file=os.devnull)
    browser = Chrome()
    driver = await arsenic.start_session(service, browser)

    try:
        while True:
            await driver.get("https://example.com")
            await asyncio.sleep(2)
    except BaseException:
        try:
            await arsenic.stop_session(driver)
        except BaseException:
            pass

try:
    asyncio.run(main())
except BaseException:
    print("exited gracefully")
dimaqq commented 3 years ago

Earlier discussion (2017): https://mail.python.org/pipermail/async-sig/2017-August/000374.html https://vorpus.org/blog/control-c-handling-in-python-and-trio/

dimaqq commented 3 years ago

I guess the canonical advice would be along the lines of https://docs.python.org/3/library/asyncio-eventloop.html#set-signal-handlers-for-sigint-and-sigterm

Some patches are still needed though 🙈