Open mwerezak opened 8 months ago
I have no idea where to even start looking for the cause of this. Any insight would be appreciated.
It seems related to creating a lot of timeouts using curio.timeout_after()
This appears to be the code in my application that triggers it:
async with self._condition:
while not self._bus_available():
try:
async with curio.timeout_after(self.wait_interval):
await self._condition.wait()
except curio.TaskTimeout:
_log.debug(f"{client}: wait.")
await client.send_message(WAIT_MSG)
I'm trying to wait for a condition but I also want to send regular messages back to the client so they know the connection isn't timed out
What version of curio? What version of Python? What operating system? Is the bug predictably reproducible or is it random?
Hi, sorry for forgetting to include that information.
curio version 1.6 Python version 3.11.5
From memory the bug was pretty predictable but depended on workload... that snippet above was being hit ~10-100/second, with the timeout value being about 5 seconds IIRC... I guess that means several hundred timeouts active at once. Under lesser loads the bug did not manifest, but I don't recall the exact number.
We implemented a workaround by frequently polling (and checking the timeout manually using time()
) instead of blocking on a condition variable. In retrospect maybe a MPSC setup using a Queue might have been a better solution than either of these.
Got this exception from inside curio/sched.py
Not sure yet as to the cause.
ntasks
is 1, butself._queue
is empty.