faust-streaming / faust

Python Stream Processing. A Faust fork
https://faust-streaming.github.io/faust/
Other
1.65k stars 183 forks source link

RocksDB recovery crashes with KeyError #73

Closed vytas-screen9 closed 2 years ago

vytas-screen9 commented 3 years ago

Checklist

Steps to reproduce

We created a simple test for Faust streaming and ran two or more parallel workers, each using own RocksDB.

Under high traffic, stopping one worker sometimes triggers crashes in the other one.

Expected behavior

Workers can be stopped, and restarted, and even in the unfortunate even they crash, it should not impact other workers operating in parallel.

Actual behavior

Other workers may crash.

Tell us what happened instead.

Full traceback

[^---Recovery]: Crashed reason=KeyError(TP(topic='<our-topic>', partition=1)) 
Traceback (most recent call last):
  File "<our venv>/lib/python3.8/site-packages/mode/services.py", line 802, in _execute_task
    await task
  File "<our venv>/lib/python3.8/site-packages/faust/tables/recovery.py", line 375, in _restart_recovery
    await self._wait(
  File "<our venv>/lib/python3.8/site-packages/faust/tables/recovery.py", line 562, in _wait
    wait_result = await self.wait_first(coro, signal, timeout=timeout)
  File "<our venv>/lib/python3.8/site-packages/mode/services.py", line 715, in wait_first
    f.result()  # propagate exceptions
  File "<our venv>/lib/python3.8/site-packages/faust/tables/recovery.py", line 666, in _build_offsets
    new_value = earliest[tp]
KeyError: TP(topic='<our topic>', partition=1)

Versions

patkivikram commented 3 years ago

Can you verify if this is fixed in 0.4.7?