apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.8k stars 4.23k forks source link

[Feature Request]: Golang support/documentation for unbounded DoFns #29213

Open pequalsnp opened 11 months ago

pequalsnp commented 11 months ago

What would you like to happen?

This seems to be supported/explained for Java/Python but not for Golang, for example in the SDF documentation

In Java, you can use @UnboundedPerElement or @BoundedPerElement to annotate your DoFn. In Python, you can use @unbounded_per_element to annotate the DoFn.

I think the right thing to do in Golang is implement IsBounded() bool, but it would be good to have that in the docs. It is also unclear how to have a regular, non-splittable, do Fn be unbounded if that is even possible. The pubsubio library uses beam.External which takes a bounded parameter, but I don't see any other examples of unbounded DoFns in golang.

Issue Priority

Priority: 3 (nice-to-have improvement)

Issue Components

lostluck commented 11 months ago

The Go SDK automatically asserts a Splittable DoFn that returns a ProcessContinuation to be unbounded. But only SplittableDoFns. https://beam.apache.org/documentation/programming-guide/#user-initiated-checkpoint

It's unclear if it's a good idea to allow non-splitable DoFns to be able to self checkpoint. The SDK would need execution side changes to enable it. But from the Beam Portability FnAPI layer, the concepts are tightly related. So at best it would be a convenience layer to things under the hood.