Hilbert indexing on parquet compressed files

davidmoten / sparse-hilbert-index

Java library to create and search random access files (including in S3) using the space-filling hilbert index (sparse)

Apache License 2.0

46 stars 3 forks source link

Hilbert indexing on parquet compressed files #1

Open matzhaugenOI opened 5 years ago

matzhaugenOI commented 5 years ago

This is very interesting. I'm wondering if you could expand this to index compressed files, like parquet, which is inherently columnar storage, e.g. for a use case with 100Gb geospatial data daily (uncompressed) over multiple years. I could see perhaps having chuncks of parquet files that are hilbert indexed and perhaps even subindexed within each file chunk. Thoughts?

davidmoten commented 4 years ago

Hi, sorry I missed this. You'd have to use a compression technique that supports random access. I don't know much about the parquet format. Happy for you to outline further how parquet indexing support can be enabled in this library.