ArroyoSystems / arroyo

Distributed stream processing engine in Rust
https://arroyo.dev
Apache License 2.0
3.81k stars 220 forks source link

Initial support for Python UDFs #736

Closed mwylde closed 2 months ago

mwylde commented 2 months ago

This PR lands the first support for Python UDFs in Arroyo! Initially supported are synchronous (i.e., quick-running) scalar UDFs that take python-native arguments (as opposed to Arrow arrays). Python UDFs that operate directly on Arrow and support for long-running UDFs will follow.

A Python UDF in Arroyo looks like this:

from arroyo_udf import udf

@udf
def my_py_add(x: int, y: int) -> int:
    return x + y

and then can be used like any scalar function

select my_py_add(x, y) from events

Currently, Python UDFs will not work in our Docker containers as they do not host a Python environment.