datafusion-contrib / arrow-zarr

Implementation of Zarr file format in Rust
Apache License 2.0
10 stars 1 forks source link

Hive style partitions #20

Closed maximedion2 closed 4 months ago

maximedion2 commented 4 months ago

This mostly follows the parquet implementation, minus schema merging and with a handful of small differences to account for the way zarr data is stored.

maximedion2 commented 4 months ago

Generally LGTM, nice work! It's unfortunate so much stuff is copied from datafusion. I almost wonder opening a ticket / PR that just makes some of this public would be good (though not sure exactly what you changed).

My only big comment is it'd be nice to make sure the tests can be run on a fresh checkout vs having some of your file paths hardcoded.

Yeah, at some point, when I realized that I had to copy yet another thing, I considered just making a ticket to ask for some stuff to be made public. I'm not sure what the turn around time is though, when asking for something like that, I think there are a lot of open tickets for datafusion, not sure when they would get to a request like this. and yes, I did have to make some minor modifications to some functions too. In the end I decided against it, but I thought maybe I could come back to this in a while, when I can make a definitive (ish) list of functions and structs I would need to use (instead of e.g. making a ticket and then realizing I need even more functions and having to make another one).