fugue-project / fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
https://fugue-tutorials.readthedocs.io/
Apache License 2.0
1.92k stars 94 forks source link

[FEATURE] Support single `dict[str,Any]` as transformer input and output #548

Closed goodwanghan closed 1 week ago

goodwanghan commented 1 week ago

Is your feature request related to a problem? Please describe. Fugue always requires dataframe-like input and output for transformers. But sometimes it is inconvenient because sometimes the best semantic of transformation would be a row to a dataframe or a dataframe to a row or a row to a row. So supporting single row functions can further reduce the frictions to express your logic in the best way.

def my_funct(row:Dict[str,Any]) -> dict[str,Any]:
    row["b"]=2
    return row

import fugue.api as fa
import pandas as pd

df = pd.DataFrame(dict(a=[1,2,3]))
fa.transform(df, my_func, schema="*,b:long")