apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.89k stars 4.27k forks source link

[Feature Request]: Interactive Beam supports asynchronous computations. #33103

Open ganesh4991 opened 1 week ago

ganesh4991 commented 1 week ago

What would you like to happen?

Summary

This feature request proposes adding asynchronous computation to Apache Beam's Interactive Beam API. This means allowing long-running tasks to execute in the background without blocking the user interface.

Motivation

Interactive Beam is a powerful tool for iterative pipeline development and debugging. However, long-running collect operations can block the interactive environment, hindering productivity and exploration. Introducing asynchronous computation would significantly improve the user experience by allowing developers to continue building the pipeline while computations are executed in the background.

Proposed Solution

Interactive Beam offers a compute API which runs asynchronously in the background and does not produce any result to be displayed on the interactive interface eg. Colab.


def compute(
*pcolls, 
wait_for_inputs: bool = True,
blocking: bool = False
runner=None,
options=None,
force_compute=False) -> None

This API introduces two new options:

compute operations can subsequently be followed collect operations on the same PCollection for users to view the result.

Benefits

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

ganesh4991 commented 1 week ago

cc @robertwb

damondouglas commented 6 days ago

cc: @damondouglas