ArroyoSystems / arroyo

Distributed stream processing engine in Rust
https://arroyo.dev
Apache License 2.0
3.81k stars 223 forks source link

Runtime failure when using `source.offset` as `group` when starting a job for the first time #632

Open prakkashm opened 6 months ago

prakkashm commented 6 months ago

For a kafka source, when I specify source.offset as group with a non-existing source.group_id while starting a job for the first time, it fails during runtime without any errors. Possibly, the job is unable to create the new kafka consumer group.

mwylde commented 6 months ago

Thanks for the report! This is actually the intended behavior, but could definitely be documented better.

'source.offset' = 'group' means that we should start from an existing consumer group (specified by source.group_id), rather than initial or latest. If there's no existing consumer group than I don't think there's a sensible behavior here, so we fail.

This feature exists for cases where you have an existing consumer group (for example, from an earlier arroyo job) and you want to pick up where that job left off. If you want to create a new consumer group, choose either 'earliest' or 'latest'.

prakkashm commented 6 months ago

Sure. As an end user, I would then expect an error on the UI about the non-existing consumer group. I'm personally of the opinion, it should behave like earliest when the consumer group is non-existent. This helps in avoiding the additional manual overhead (of changing the earliest to group) that a user must do once if they need to restart an existing job to continue from where it stopped the first time.