This feature request proposes adding asynchronous computation to Apache Beam's Interactive Beam API. This means allowing long-running tasks to execute in the background without blocking the user interface.
Motivation
Interactive Beam is a powerful tool for iterative pipeline development and debugging. However, long-running collect operations can block the interactive environment, hindering productivity and exploration. Introducing asynchronous computation would significantly improve the user experience by allowing developers to continue building the pipeline while computations are executed in the background.
Proposed Solution
Interactive Beam offers a compute API which runs asynchronously in the background and does not produce any result to be displayed on the interactive interface eg. Colab.
wait_for_inputs: Whether to wait until the asynchronous dependencies are
computed. Setting this to False allows to immediately schedule the
computation, but also potentially results in running the same pipeline
stages multiple times.
blocking: If False, the computation will run in non-blocking fashion. In
Colab/IPython environment this mode will also provide the controls for the
running pipeline. If True, the computation will block until the pipeline
is done.
compute operations can subsequently be followed collect operations on the same PCollection for users to view the result.
Benefits
The ability to compute time consuming PCollections asynchronously
Sink operations which do not produce any meaningful output can use compute instead of collect
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
What would you like to happen?
Summary
This feature request proposes adding asynchronous computation to Apache Beam's Interactive Beam API. This means allowing long-running tasks to execute in the background without blocking the user interface.
Motivation
Interactive Beam is a powerful tool for iterative pipeline development and debugging. However, long-running
collect
operations can block the interactive environment, hindering productivity and exploration. Introducing asynchronous computation would significantly improve the user experience by allowing developers to continue building the pipeline while computations are executed in the background.Proposed Solution
Interactive Beam offers a
compute
API which runs asynchronously in the background and does not produce any result to be displayed on the interactive interface eg. Colab.This API introduces two new options:
wait_for_inputs
: Whether to wait until the asynchronous dependencies are computed. Setting this to False allows to immediately schedule the computation, but also potentially results in running the same pipeline stages multiple times.blocking
: If False, the computation will run in non-blocking fashion. In Colab/IPython environment this mode will also provide the controls for the running pipeline. If True, the computation will block until the pipeline is done.compute
operations can subsequently be followedcollect
operations on the samePCollection
for users to view the result.Benefits
PCollections
asynchronouslycompute
instead ofcollect
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components