Open ianb-pomelo opened 7 months ago
cc @nielm
Draining streaming should work in general. If it's not draining there is something missing in the SDF, a fix similar to #25716 may work
cc @thiagotnunes (change streams author) for comment, But I believe ChangeStreams does not use SDFs, which is probably why the drain is not working. #
The partitions are generated by Spanner itself and are read by a normal DoFn. (SpannerIO:1751)
cc @nancyxu123 , current owner here
Hey @ianb-pomelo
Thanks for the feedback! Draining is something that we have in our backlog, but not prioritized yet. I really appreciate the context that you provided, I will add that to our internal ticket and we'll update here when this gets prioritized.
Thanks!
Eike
Thanks for the update, looking forward to seeing it prioritized!
What would you like to happen?
Right now one of the known limitations of the Spanner change stream source is it can't be drained 1. Is there a way to allow draining this connector?
Currently our use case is we have a job that consumes change stream value but the structure of this jobs changes frequently. To handle this, we try to do in-place updates and if those fail, drain and start a new job. This works with Pub/Sub sources but to get around the fact that the change streams can't be drained, we have an intermediate job that converts the Spanner changes into Pub/Sub messages and then the changing job consumes that. However, this has caused a huge increase in latency, the commit time -> change stream read is pretty consistently 200ms but when we add this Pub/Sub layer, it increases the latency to ~5s.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components