This PR adds a new way to run Arroyo, currently called "pipeline clusters" in lieu of a better name.
In this mode, the user pre-configures Arroyo with a query, which the system will run on startup. Other queries can be schedule via the API or UI, but the initial one is managed by the process—so it will be automatically stopped (with a checkpoint) when the process is stopped.
As a user, it looks like this:
$ arroyo run --help
Run a query as a local pipeline cluster
Usage: arroyo run [OPTIONS] [QUERY]
Arguments:
[QUERY] The query to run [default: -]
Options:
-n, --name <NAME> Name for this pipeline
--database <DATABASE> Path to a database file to save to or restore from
-p, --parallelism <PARALLELISM> Number of parallel subtasks to run [default: 1]
-h, --help Print help
$ arroyo run query.sql
2024-06-18T17:20:40.113741Z INFO arroyo::run: Job transitioned to Scheduling
2024-06-18T17:20:40.321154Z INFO arroyo::run: Job transitioned to Running
2024-06-18T17:20:40.423451Z INFO arroyo::run: Pipeline running... dashboard at http://localhost:53093/pipelines/pl_PmdK8u3ydK
{"AVG(price)":64651.984675480766}
{"AVG(price)":64641.925167410714}
{"AVG(price)":64640.880196049526}
^C2024-06-18T17:20:56.872535Z INFO arroyo::run: Stopping pipeline with a final checkpoint...
2024-06-18T17:20:57.295913Z INFO arroyo::run: Job transitioned to Stopped
Users can also provide the query as an environment variables (ARROYO__QUERY) which may be helpful when running as a docker container, for example in ECS.
This PR also adds a new sink—StdOut—which simply prints outputs to stdout. It's the default sink for pipeline clusters, and provides an interactive experience in the console.
This PR adds a new way to run Arroyo, currently called "pipeline clusters" in lieu of a better name.
In this mode, the user pre-configures Arroyo with a query, which the system will run on startup. Other queries can be schedule via the API or UI, but the initial one is managed by the process—so it will be automatically stopped (with a checkpoint) when the process is stopped.
As a user, it looks like this:
Users can also provide the query as an environment variables (
ARROYO__QUERY
) which may be helpful when running as a docker container, for example in ECS.This PR also adds a new sink—StdOut—which simply prints outputs to stdout. It's the default sink for pipeline clusters, and provides an interactive experience in the console.