--no-strict-offset-reset doesn't always work

getsentry / arroyo

A library to build streaming applications that consume from and produce to Kafka.

https://getsentry.github.io/arroyo/

Apache License 2.0

45 stars 7 forks source link

--no-strict-offset-reset doesn't always work #389

Open mwarkentin opened 2 hours ago

mwarkentin commented 2 hours ago

Steps to Reproduce

Not sure, just adding a placeholder for further information.

Expected Result

--no-strict-offset-reset would work, and enable consumers to reset their own offsets when out of retention.

it might be the combination of earliest and no-strict-offset-reset, latest would've probably worked

Actual Result

Not sure?

untitaker commented 2 hours ago

I think the combination of --auto-offset-reset=earliest and --no-strict-offset-reset sometimes ends in situations where arroyo resets to an offset that already expired, therefore failing to reset the offset

%4|1729638367.626|OFFSET|rdkafka#consumer-2| [thrd:main]: snuba-spans [62]: offset reset (at offset 70037080713 (leader epoch 4), broker 0) to offset BEGINNING (leader epoch -1): fetch failed due to requested offset not available on the broker: Broker: Offset out of range

mwarkentin commented 2 hours ago

@untitaker was that a one time thing? Or does the consumer continually attempt to reset to offsets that are already out of bounds?

Eg. is this an issue only on very high throughput topics, or ones where the consumer takes a while to commit the first batch?

untitaker commented 2 hours ago

I think it's a race condition in arroyo that allows this to happen. I think we should probably support constructs like --auto-offset-reset=earliest+1h to self-serve what we already end up doing manually

mwarkentin commented 1 hour ago

We should also reconsider (per our discussions) if --no-strict-offset-reset even needs to be a thing anymore now that we primarily use --auto-offset-reset=earliest for all of our consumers.