Open Folyd opened 1 year ago
There are no plans that i know of yet.
In theory, we should be able to create an extension package, much like the duckdb model, rather than extending the core DataFusion engine.
I suspect there would be certain things that are not yet feasible (like adding a GEOMETRY
type / alias for example) but otherwise the existiing extension points for DataFusion should be sufficient (ScalarUDFs, AggregateUDFs, etc)
Perhaps we can do something similar for the JSON/BSON support we are discussing in #7845
Thanks @alamb.
Here is my example of handling geometry parquet data: https://github.com/apache/arrow-rs/issues/4945 There are Geoparquet format in the community: https://geoparquet.org/ Also see: https://getindata.com/blog/introducing-geoparquet-data-format/
Since it hasn't been mentioned yet, I'd add there is already a project for Arrow extension types for geospatial data: https://github.com/geoarrow/geoarrow/blob/main/extension-types.md
This is related to the GeoParquet project.
Thanks @wjones127 -- I had forgotten about extension types.
Maybe we could add support for extension types in DataFusion's core and use that extension point to implement a geospatial package on top of DataFusion 🤔
Having a good first use case (Geospatial and possible JSON) to drive the requirements seems like a good idea.
If you agree, I can try and write up a larger project description
@alamb I have the same requirement as well and hope to initiate it as soon as possible. If possible, I can also contribute code for this.
@alamb I have the same requirement as well and hope to initiate it as soon as possible. If possible, I can also contribute code for this.
That is great news @yukkit -- I don't think I have the bandwidth to try and organize an effort to add Geospatial support to DataFusion in the near term. I wonder if anyone has the bandwidth to help organize an effort to add extension type support? I don't know enough about how this works to really do so without additional research, and sadly I don't have the time at the moment to devote there
@alamb Ok, if possible, I plan to support UDT (user-defined type) in datafusion, I will paste my ideas in the next few days for anyone to discuss
I would love to see a design proposal for user defined types. ❤️ -- thank you!
I would love to see a design proposal for user defined types. ❤️ -- thank you!
Of course, it's absolutely essential!
My goal is to enable spatial support in projects such as datafusion via https://github.com/geoarrow/geoarrow-rs
I'd argue that spatial data support is pretty much blocked until datafusion has support for user-defined types, since it's pretty crucial to pass along GeoArrow metadata, so it's really exciting to see https://github.com/apache/datafusion/issues/11513 / https://github.com/apache/datafusion/pull/11160 !
Is your feature request related to a problem or challenge?
Currently,
datafusion
does not support spatial data, any plan to support this?Describe the solution you'd like
Similar to duckdb: https://duckdb.org/docs/extensions/spatial.html
Describe alternatives you've considered
Duckdb
Additional context
https://cloud.google.com/bigquery/docs/geospatial-data