dabeaz / curio

Good Curio!
Other
4.01k stars 240 forks source link

IndexError: pop from an empty deque from _kernel_wake #365

Open mwerezak opened 8 months ago

mwerezak commented 8 months ago

Got this exception from inside curio/sched.py

Not sure yet as to the cause.

Traceback (most recent call last):
  File "...", line 72, in <module>
    curio.run(main)
  File "...\venv\Lib\site-packages\curio\kernel.py", line 824, in run
    return kernel.run(corofunc, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\venv\Lib\site-packages\curio\kernel.py", line 146, in run
    task = self._runner(coro)
           ^^^^^^^^^^^^^^^^^^
  File "...\venv\Lib\site-packages\curio\kernel.py", line 768, in kernel_run
    traps[trap[0]](*trap[1:])
  File "...\venv\Lib\site-packages\curio\kernel.py", line 519, in trap_sched_wake
    tasks = sched._kernel_wake(n)
            ^^^^^^^^^^^^^^^^^^^^^
  File "...\venv\Lib\site-packages\curio\sched.py", line 88, in _kernel_wake
    task, = self._queue.popleft()
            ^^^^^^^^^^^^^^^^^^^^^
IndexError: pop from an empty deque

ntasks is 1, but self._queue is empty.

__exception__ = {tuple: 3} (<class 'IndexError'>, pop from an empty deque, <traceback object at 0x00000261BE8AA500>)
ntasks = {int} 1
self = {SchedFIFO: 0} SchedFIFO<0 tasks waiting>
  _abc_impl = {_abc_data} <_abc._abc_data object at 0x000002619CFB9E80>
  _actual_len = {int} 0
  _queue = {deque: 0} deque([])
task = {NoneType} None
tasks = {list: 0} []
 __len__ = {int} 0
mwerezak commented 8 months ago

I have no idea where to even start looking for the cause of this. Any insight would be appreciated.

mwerezak commented 8 months ago

It seems related to creating a lot of timeouts using curio.timeout_after()

This appears to be the code in my application that triggers it:

        async with self._condition:
            while not self._bus_available():
                try:
                    async with curio.timeout_after(self.wait_interval):
                        await self._condition.wait()
                except curio.TaskTimeout:
                    _log.debug(f"{client}: wait.")
                    await client.send_message(WAIT_MSG)

I'm trying to wait for a condition but I also want to send regular messages back to the client so they know the connection isn't timed out

dabeaz commented 8 months ago

What version of curio? What version of Python? What operating system? Is the bug predictably reproducible or is it random?

mwerezak commented 6 months ago

Hi, sorry for forgetting to include that information.

curio version 1.6 Python version 3.11.5

From memory the bug was pretty predictable but depended on workload... that snippet above was being hit ~10-100/second, with the timeout value being about 5 seconds IIRC... I guess that means several hundred timeouts active at once. Under lesser loads the bug did not manifest, but I don't recall the exact number.

We implemented a workaround by frequently polling (and checking the timeout manually using time()) instead of blocking on a condition variable. In retrospect maybe a MPSC setup using a Queue might have been a better solution than either of these.