datafusion-contrib / arrow-zarr

Implementation of Zarr file format in Rust
Apache License 2.0
10 stars 1 forks source link

TODOs to get zarr arrow in a reasonable, usable state #21

Open maximedion2 opened 5 months ago

maximedion2 commented 5 months ago

This will be a list of TODOs for the overall project of writing a query engine for Zarr files (and eventually other raster formats... maybe). I'm going to split the overall project in 3 phases, numbered 0, 1 and 2. Each TODO on the list will eventually be assigned an issue with more details and a PR for the implementation.

maximedion2 commented 5 months ago

Phase 0: This phase is about implementing the foundation for a query engine that shamelessly leverages that 1) Zarr is a heavily chunked up storage format and that 2) raster data typically involves some of the data representing some sort of coordinates, with most queries involving filtering on those coordinates. As I'm making this list, I already have the basics implemented, what's left is

Phase 1: This phase will be about implementing a more generic version of the query engine that can be implemented for various raster formats. The broad steps will be

Phase 2: This phase will be about implements efficient geospatial queries, that will work of off WKT strings. Realistically, I'm not going to implement a completely new type of data in DataFusion, I will have to rely on passing string to geospatial functions, or transforming data (like 2 floats for a point) into a string, that can then be passed to geospatial functions. The steps would be

maximedion2 commented 5 months ago

@tshauck feel free to add anything here of course.