apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine
https://datafusion.apache.org/ballista
Apache License 2.0
1.39k stars 181 forks source link

df.write_xxx no longer working in ballista #894

Closed yarenty closed 6 months ago

yarenty commented 8 months ago

Describe the bug

Since the update to Datafusion 30 ( rev: 7d774481aedc027b7f68226b2c3a4fc0db959fc2 ) as Dataframe moved to use LogicalPlan::Copy when executing write (csv, parquet,json) df. write_xxx is no longer working in Ballista.

To Reproduce One can use any sql and then df.write ie: in examples standalone_sql.rs add write_csv() instead show():

    let df = ctx.sql("select count(1) from test").await?;

    df.write_csv("output.csv", DataFrameWriteOptions::default(), None).await?;

cargo run --example standalone_sql

Output:

Error: DataFusionError(Internal("failed to serialize logical plan: Internal(\"LogicalPlan serde is not yet implemented for Copy\")"))

Expected behavior File created.

Additional context See PR: https://github.com/apache/arrow-datafusion/pull/7283

Looks like this is todo - there are no serde / proto changes in datafusion.

Question: should I ask about it in datafusion repo ?

andygrove commented 6 months ago

Depends on the following DataFusion issues: