-
From the extra TODOs in #2004, ordered roughly from simplest to most complicated.
- [x] Var List offsets can be bitpacked
- [x] InternalIDs can be bitpacked
- [x] INT16 and INT8 bitpacking
- [x]…
-
Hi,
I would like to explore enhancing the Lance v2 format's dictionary encoding by integrating bitpacking to further reduce storage space. For instance, in a string array with 50 unique values, the…
niyue updated
1 month ago
-
I am trying to install BPCells on a Mac M1, and I figured out how to fix the hdf5 issue but another error popped out.
has anyone encountered the same issue?
bitpacking_io.cpp:259:51: error: no ma…
-
Hey! Fantastic work here!
I am trying to introduce this crate to [lance](https://github.com/lancedb/lance) but lance builds on stable rust, is there any plan to support build this awesome crate on …
-
We currently implement naive serde using Rust serde + flexbuffers by default.
Many arrays can pack their metadata much more tightly.
This is an overview issue to track auditing each one:
- [ ] Bo…
-
There is currently a specific problem: the default writer preserves the chunking of its input; however, the default reader forces a 64Ki batch size (this is configurable but defaults to 64Ki; see `vor…
-
Currently, all ULE values are aligned to a byte boundary. We should consider making them aligned to a bit boundary instead.
This would solve two problems:
1. Inefficient storage: for example, `S…
-
I am using Avro and Crunch 0.11 to write data into Hadoop CDH 4.6 in parquet format. This works fine for a few gigabytes but blows up in the RunLengthBitPackingHybridDecoder when reading a few thousan…
-
## Feature request
**Is your feature request related to a problem? Please describe.**
It's not very clear from the documentation what sort of string compression algorithms apply to low cardinali…
-
We should add a bitpacking numcodecs filter for biallelic diploid data since it makes for a substantial improvement over zarr's default compression. Here's an example: https://nbviewer.jupyter.org/gi…