apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.5k stars 1.02k forks source link

Implement physical plan serialization for COPY plans `CsvLogicalExtensionCodec` #11150

Open alamb opened 1 week ago

alamb commented 1 week ago

Is your feature request related to a problem or challenge?

As part of https://github.com/apache/datafusion/pull/11060, @devinjdangelo made file format support into a Trait which is good!

However the code to serialize these new (dynamic) structures is not yet implemented

As @devinjdangelo says https://github.com/apache/datafusion/pull/11060/files#r1650268578

Users depending on the ability to serialize COPY plans (e.g. ballista) will need this TODO to be completed before upgrading to any version of datafusion including this change.

It turns out there are no unit tests for them either so no tests failed

Describe the solution you'd like

Implement the named codec for serializing plans and a test for it

Describe alternatives you've considered

The code is here: datafusion/proto/src/logical_plan/file_formats.rs The test would go here: https://github.com/apache/datafusion/blob/main/datafusion/proto/tests/cases/roundtrip_physical_plan.rs

Note there is already coverage for LogicalPlans here: https://github.com/apache/datafusion/blob/d2ff2189dfb8b4624ae2c08846cd713871b37d8c/datafusion/proto/tests/cases/roundtrip_logical_plan.rs#L325-L346

Additional context

There are several other codecs needed:

However, I think we need to get one example done and then we can file tickets to fill out the others

Maybe this is what @lewiszlw was getting at with https://github.com/apache/datafusion/pull/11095 / https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/composed_extension_codec.rs

Lordworms commented 1 week ago

take