apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.82k stars 4.24k forks source link

Go Direct Runner Improvements #20647

Closed damccorm closed 1 year ago

damccorm commented 2 years ago

The Go SDK has a simple direct runner intended for basic batch framework testing. That is, it's only suitable for the barest tests, and not that it ensures that the basics work for arbitrary pipelines.

The runner has the following features:

Further, the runner hasn't been validated for beam semantics, nor have more complex features of the Beam Model been implemented or validated. This makes it unsuitable for more than it's current use for demoing the SDK in basic batch operation, and the light use it has testing the SDK itself.

However, implementing full beam semantics for a runner, even without the distributed portion is a project in itself. It's part of the beam design that implementing the semantics for a beam runner to be more complicated on the runner side vs the SDK side. 

But there's no reason why we can't improve the Go Direct Runner to match all semantics required of beam for single machine contexts.

In particular the various improvements below could be made (and should probably be sharded into separate sub task JIRAs as required):

 

A good place to start is being able to run and execute pipelines on the Python Portable runner, which implements all beam semantics correctly. Instructions for doing so are on Go Tips page in the Dev Wiki

 

Direct Runner Code: https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/runners/direct

SDK Harness Code: https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/harness/harness.go 

 

Imported from Jira BEAM-11076. Original Jira may contain additional context. Reported by: lostluck.

lostluck commented 1 year ago

Overall this issue has been supplanted entirely by Prism, and replacing the direct runner with Prism. Closing.

See https://github.com/apache/beam/issues/24789