Open mj0nez opened 1 month ago
Yeah this seems like surprising behavior, feel free to change it. I am not sure if the issue is isolated to unbatch or if batch already loses the commit data.
I took a shot, feel free to take it apart :)
This has a PR in review: https://github.com/getsentry/arroyo/pull/371
(posting to take it out of our support queue)
Hi, I’m not sure if this is a really a bug, but I stumbled over the Unfold strategy’s message generation. Consider the use case where our streaming process consists of the following steps:
When the
ValuesBatch
is unbatched in step 3, theUnfold
strategy creates a newMessage
instance with a newValue
which is then submitted to step 4. Unfortunately, only the last message of the batch gets a committable, although all payloads might have one already. From what I gathered from theBatchStep
’s flow, I assumed it would just fan out the messages and submit them one after the other to the following step, thus unchanged.I believe the current behavior, while useful to reduce the number of commits, does not belong to a generic strategy or at least is a bit hidden. My reasoning for moving this out of the strategy or add a note to both classes, would be that for downstream steps like number 4 it is now impossible to provide Partition and offset when raising an
InvalidMessage
exception. Furthermore, in case of an exception in the batches last message, the commit information is lost all together.Environment
arroyo 2.17.4
Steps to Reproduce