That3Percent / tree-buf

An experimental serialization system written in Rust
MIT License
256 stars 8 forks source link

Floating point compression #1

Closed That3Percent closed 4 years ago

That3Percent commented 4 years ago

Tree-buf requires the integration of one or more high-performance floating-point compression libraries. The currently integrated Gorilla compressor panics at runtime, and the initial investigation indicates that both faster and higher compression ratio techniques are available.

The task is to research algorithms and libraries, evaluate their trade-offs using multiple real-world data sets, and integrate one or more of the best into Tree-Buf.

This is a huge undertaking. It is likely that this issue will be split into multiple issues but the first step is probably to understand the space and make a realistic evaluation of what algorithms are candidates.

Mentoring

I'll be glad to answer any questions about Tree-Buf, it's architecture, internal APIs, and benchmarks. For SIMD and compression, you're on your own.

Candidate algorithms and resources

In the event that some algorithms perform better on different data sets, it is not necessary that just one floating-point compression algorithm be selected. Tree-Buf does a pass on a sample of the data set when selecting what algorithm to use and may support options that allow the user to select between speed or high compression ratios. There are also options to enable lossy compression within a given tolerance.

Requirements

Much of the work need not exist in the Tree-Buf repository. If you want to put the actual compression algorithms in a separate crate that is not only ok but encouraged.

What's been tried already?

That3Percent commented 4 years ago

The Gorilla situation has significantly improved, closing for now.