G-Research / tdigest-rs

Simple Python package to compute TDigests, implemented in Rust
Apache License 2.0
4 stars 2 forks source link

TDigest-rs

PyPi Latest Release

Simple Python package to compute TDigests, implemented in Rust.

Introduction

TDigest-rs is a Python library with a Rust backend that implements the T-Digest algorithm, enhancing the estimation of quantiles in streaming data. For an in-depth exploration of the T-Digest algorithm, refer to Ted Dunning and Otmar Ertl's paper and the G-Research blog post.

Usage

pip install tdigest-rs

The library contains a single TDigest class.

Creating a TDigest object


from tdigest_rs import TDigest

# Fit a TDigest from a numpy array (float32 or float64)
arr = np.random.randn(1000)
tdigest = TDigest.from_array(arr=arr, delta=100.0)  # delta is optional and defaults to 300.0
print(tdigest.means, tdigest.weights)

# Create directly from means and weights arrays
vals = np.random.randn(1000).astype(np.float32)
weights = np.ones(1000).astype(np.uint32)
tdigest = TDigest.from_means_weights(arr=vals, weights=weights)

Computing quantiles


# Compute a quantile
tdigest.quantile(0.1)

# Compute median
tdigest.median()

# Compute trimmed mean
tdigest.trimmed_mean(lower=0.05, upper=0.95)

Merging TDigests


arr1 = np.random.randn(1000)
arr2 = np.ones(1000)
digest1 = TDigest.from_array(arr=arr1)
digest2 = TDigest.from_array(arr=arr2)

merged_digest = digest1.merge(digest2, delta=100.0)  # delta again defaults to 300.0

Serialising TDigests

The TDigest object can be converted to a dictionary and JSON-serialised and is also pickleable.


# Convert and load to/from a python dict
d = tdigest.to_dict()
loaded_digest = TDigest.from_dict(d)

# Pickle a digest
import pickle

pickle.dumps(tdigest)

Development workflow

pip install hatch

cd bindings/python

# Run linters
hatch run dev:lint

# Run tests
hatch run dev:test

# Run benchmark
hatch run dev:benchmark

# Format code
hatch run dev:format

Contributing

Please read our contributing guide and code of conduct if you'd like to contribute to the project.

Community Guidelines

Please read our code of conduct before participating in or contributing to this project.

Security

Please see our security policy for details on reporting security vulnerabilities.

License

TDigest-rs is licensed under the Apache Software License 2.0 (Apache-2.0)