Open melbourne2991 opened 3 months ago
this all looks really good to me.
- I'd love to see some tests on "what happens when you partition by something other than a date"
- we should definitely open another ticket for "reading from hive-partitioned files." right now you can use our glob, function, and that helps, but there is a push down projection that this might not be able to do. Clearly out of scope for this ticket, but it'd be killer feature either way.
- I'd love to see a test with another file format (json or bson?) just just to make sure that it's generic enough and doesn't rely on something parquet specific.
- I think it'd be good to be explicit about the expectation that the partitioned field remains in the output data or is elided because it's in the partition, so a test there would be good.
Agree on all these points - thanks for the feedback. (Just a note: the PR isn't in its final form yet. The current test was primarily for development ease - more comprehensive tests are on the way!).
marking as draft as it's not actively waiting on review.
@melbourne2991 please feel free to ping us when it is ready.
@melbourne2991 wanted to check in on this. Is there anything I can do to help you on this?
hey @tychoish, apologies, I've been swamped lately - I'm not sure if I'll have time to get around to this in any reasonable time frame - happy for someone else to pick it up, there shouldn't be too much effort left on it I hope
Addresses (https://github.com/GlareDB/glaredb/issues/2462)
Provides hive partitioning support for Parquet & Json.
Missing from this PR: