apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.78k stars 4.21k forks source link

[Feature Request]: Support draining Spanner Change Stream connectors #30167

Open ianb-pomelo opened 7 months ago

ianb-pomelo commented 7 months ago

What would you like to happen?

Right now one of the known limitations of the Spanner change stream source is it can't be drained 1. Is there a way to allow draining this connector?

Currently our use case is we have a job that consumes change stream value but the structure of this jobs changes frequently. To handle this, we try to do in-place updates and if those fail, drain and start a new job. This works with Pub/Sub sources but to get around the fact that the change streams can't be drained, we have an intermediate job that converts the Spanner changes into Pub/Sub messages and then the changing job consumes that. However, this has caused a huge increase in latency, the commit time -> change stream read is pretty consistently 200ms but when we add this Pub/Sub layer, it increases the latency to ~5s.

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

liferoad commented 7 months ago

cc @nielm

Abacn commented 7 months ago

Draining streaming should work in general. If it's not draining there is something missing in the SDF, a fix similar to #25716 may work

nielm commented 7 months ago

cc @thiagotnunes (change streams author) for comment, But I believe ChangeStreams does not use SDFs, which is probably why the drain is not working. #

The partitions are generated by Spanner itself and are read by a normal DoFn. (SpannerIO:1751)

thiagotnunes commented 7 months ago

cc @nancyxu123 , current owner here

efalkenberg commented 7 months ago

Hey @ianb-pomelo

Thanks for the feedback! Draining is something that we have in our backlog, but not prioritized yet. I really appreciate the context that you provided, I will add that to our internal ticket and we'll update here when this gets prioritized.

Thanks!

Eike

ianb-pomelo commented 7 months ago

Thanks for the update, looking forward to seeing it prioritized!