datafusion-contrib / datafusion-dft

A batteries included data processing and DataFusion development app for the terminal
Apache License 2.0
114 stars 8 forks source link

Create / add duckdb metadata functions #148

Closed matthewmturner closed 1 month ago

matthewmturner commented 2 months ago

Idea is to implement these metadata capabilities from duckdb

          I suggest we file a second ticket for implementing parquet_metadata and other duckdb metadata functions

Originally posted by @alamb in https://github.com/datafusion-contrib/datafusion-dft/issues/125#issuecomment-2353468345

Here is parquet_metadata implementation in datafusion-cli:

Implementation is here: https://github.com/apache/datafusion/blob/257e1409eca81cfff024ecc5e2567e9f67e6b5a3/datafusion-cli/src/functions.rs#L317-L459

I would like to suggest creating those functions in their own crate (perhaps datafusion-functions-parquet?) -- it could be in the datafusion-dft repo initially for convenience, but I think eventually the goal should be that dft just be focused on integration rather than actually implementing such features.

@matthewmturner says:

i agree with putting it in its own crate. like @alamb said i also think that dft could be used as an incubator of sorts. For example I have taken that approach in my WASM function factory PR. I have no intention of keeping that in this repo but its quite convenient for the time while it matures.

alamb commented 2 months ago

Updated the description a littlre

devanbenz commented 2 months ago

take

matthewmturner commented 2 months ago

Cross posting here as it relates to duckdb functions https://github.com/apache/datafusion/issues/12254

matthewmturner commented 1 month ago

hi @devanbenz just checking in on this - do you think youll have time to work on this? if not i can pick it up.

matthewmturner commented 1 month ago

i have started work on this