I looked into the example and the physical plan substrait producer/consumer code. Unfortunately for physical plans, the subtrait consumer and producer are only implemented for ParquetExec and even then they are not fully implemented, so I do not believe any practical example will execute without further development.
Here is an example which makes it further than the above but panics on the roundtrip assertion:
use datafusion::prelude::*;
use std::collections::HashMap;
use datafusion::error::Result;
use datafusion_substrait::physical_plan;
#[tokio::main(flavor = "current_thread")]
async fn main() -> Result<()>{
// Create a plan that scans table 't'
let ctx = SessionContext::new();
let testdata = datafusion::test_util::parquet_test_data();
ctx.register_parquet(
"alltypes_plain",
&format!("{testdata}/alltypes_plain.parquet"),
ParquetReadOptions::default(),
)
.await?;
let df = ctx
.sql(
"SELECT * from alltypes_plain",
)
.await?;
let physical_plan = df.create_physical_plan().await?;
// Convert the plan into a substrait (protobuf) Rel
let mut extension_info= (vec![], HashMap::new());
let substrait_plan = physical_plan::producer::to_substrait_rel(physical_plan.as_ref(), &mut extension_info)?;
// Decode bytes from somewhere (over network, etc.) back to ExecutionPlan
let physical_round_trip = physical_plan::consumer::from_substrait_rel(
&ctx, &substrait_plan, &HashMap::new()
).await?;
assert_eq!(format!("{:?}", physical_plan), format!("{:?}", physical_round_trip));
Ok(())
}
You can see that the round trip lost many details about the ParquetExec such as projected_schema and projected_statistics.
I think if we want to include a user facing example of a physical plan substrait roundtrip, we will need to cut a ticket to complete the implementation of ParquetExec to substrait first.
It looks like #5176 built the initial framework for serializing physical plans, but it hasn't been picked up since then.
@devinjdangelo says
I looked into the example and the physical plan substrait producer/consumer code. Unfortunately for physical plans, the subtrait consumer and producer are only implemented for
ParquetExec
and even then they are not fully implemented, so I do not believe any practical example will execute without further development.Here is an example which makes it further than the above but panics on the roundtrip assertion:
And here is the panic output:
You can see that the round trip lost many details about the
ParquetExec
such as projected_schema and projected_statistics.I think if we want to include a user facing example of a physical plan substrait roundtrip, we will need to cut a ticket to complete the implementation of
ParquetExec
to substrait first.It looks like #5176 built the initial framework for serializing physical plans, but it hasn't been picked up since then.
Originally posted by @devinjdangelo in https://github.com/apache/arrow-datafusion/issues/9299#issuecomment-1958303415