Arbol-Project / dcdf

dClimate Data Format
2 stars 0 forks source link

Unify Superchunks/Subchunks #25

Open chrisrossi opened 1 year ago

chrisrossi commented 1 year ago

In this first pass, superchunks and subchunks use pretty different data structures. Subchunks, of course, use the K-squared raster algorithm. Superchunks, instead, use a structure that only stores min/max values at leaf nodes, and does not reproduce the K-squared style quad-tree structure. The original thinking was that higher level nodes were less likely to collapse anyway and this was a bit easier for the first draft. Now that we're contemplating encoding datasets large enough to warrant multiple layers of superchunks, it probably does make sense to go ahead and make the whole structure use the K-squared algorithm. Potentially we could refactor such that superchunks/subchunks use largely the same code.

Additionally, even a subchunk, if the data is of high enough cardinality, might be more efficiently stored using the leaf-node only structure of superchunks. So we could also implement that structure for both subchunks and superchunks and choose one or the other at encoding time based in which is smaller when encoded.