aiokitchen / hasql

hasql - is a library for high available PostgreSQL clusters.
Apache License 2.0
42 stars 7 forks source link

PoolManager init blocked when a node is down #7

Closed elrik75 closed 1 year ago

elrik75 commented 1 year ago

When a node is down and you start your app with a PoolManager, self._check_pool_task() is called in self.__init__() for each dsn, and then self._wait_creating_pool(). If a node is down then _wait_creating_pool() enters in an infinite loop, that's normal but this loop must retry smoothly to avoid 100% CPU and other coroutines to be scheduled.

Here a proposition:

    async def _wait_creating_pool(self, dsn: Dsn):
        while not self._closing:
            try:
                return await asyncio.wait_for(
                    self._pool_factory(dsn),
                    timeout=self._refresh_timeout,
                )
            except Exception:

                ########## ↓ fix here ###########
                await asyncio.sleep(1)  # needed to avoid a 100% CPU with all other coroutine blocked

                logger.warning(
                    "Creating pool failed with exception for dsn=%s",
                    dsn.with_(password="******"),
                    exc_info=True,
                )

tested on: Linux + PG 15 + asyncpg 0.27.0 + hasql 0.5.10

dizballanze commented 1 year ago

Hey! Are you sure that this is the reason for 100% CPU usage? I mean, there's an await inside the loop, so it shouldn't be blocked.

elrik75 commented 1 year ago

Hey! Are you sure that this is the reason for 100% CPU usage? I mean, there's an await inside the loop, so it shouldn't be blocked.

Not sure for the other coroutines (but mine are never executed). Maybe there is a race to be called by the scheduler. But this code is run as fast as possible and I'm pretty sure it's the cause of the 100% CPU. edit: with the sleep(1), my other coroutines start and CPU is fine.