Open elja opened 2 years ago
Can be reproduced also by example examples/tableofset.py
jsut by adding store='rocksdb://'
Code that produces assert is similar to all other table operations where there is this code instead
if event is None:
raise TypeError("Cannot modify table key from outside of stream iteration")
After examining the code, this is exactly the case. The table is written outside of the stream iteration. There is a separate periodic task that takes the memory representation and tries to flush it out to the RocksDB table. However, this cannot work. I just don't understand how it can work at all any time, because it is broken by design.
@j123b567 I highly recommend you to use java spring + kafka streams directly if it's possible. After spending many days on faust I decided to switch and after a couple of days of learning, I did what I wanted. I thought that fixing issues that you face with faust will require you to contribute to this repo A LOT. Faust is a great tool and this is really sad that nobody maintains it, and on the other hand, only people who know kafka steams well can do it.
@elja my app is now functional and using faust-streaming :D but this proposal will be probably my next step.
I took a look at your example and this part caught my attention:
async def task_queue_sink(job):
logging.info(f"Sending task: task_id = {job['task_id']} / uuid = {job['uuid']}")
await task_queue_topic.send(key=job['key'], value=job)
tasks_progress[job['key']].add(job['task_id'])
Using this as an agent sink to modify a table is entering murky waters. Taken from https://faust.readthedocs.io/en/latest/userguide/agents.html#concurrency:
Warning:
Concurrent instances of an agent will process the stream out-of-order, so you cannot mutate tables from within the agent function:
An agent having concurrency > 1, can only read from a table, never write.
I haven't tried writing to a Table using your methodology, so i'm afraid I can't be too helpful.
memory
storage works because it's a base.Store
and is relatively straightforward. The rocksdb
driver is derived from the base.SerializedStore
class, which is a separate beast altogether. The functionality required for the rocksdb
driver to satisfy your use-case will require some changes made to faust.
Checklist
master
branch of Faust.Steps to reproduce
If you use "rocksdb" as storage you got this error, it doesn't happen if you use memory.
Basically, I have a list of tasks that I load from the database and store locally (no kafka here), then I have some external events that are triggering execution of these tasks, the only thing I want is to not send a task to a kafka if it's already was sent and currently processing. I use a SetTable to track the progress of the task execution. All topics and this table have the same number of partitions and also use the same key, so they should be co-partitioned.
Tell us what you did to cause something to happen.
Expected behavior
I should be able to store data to the SetTable that uses RocksDB without any errors.
Actual behavior
AssertionError: assert event is not None
../python3.9/site-packages/faust/stores/rocksdb.py#274Full traceback
Versions