Simple Python package to compute TDigests, implemented in Rust.
TDigest-rs is a Python library with a Rust backend that implements the T-Digest algorithm, enhancing the estimation of quantiles in streaming data. For an in-depth exploration of the T-Digest algorithm, refer to Ted Dunning and Otmar Ertl's paper and the G-Research blog post.
pip install tdigest-rs
The library contains a single TDigest
class.
from tdigest_rs import TDigest
# Fit a TDigest from a numpy array (float32 or float64)
arr = np.random.randn(1000)
tdigest = TDigest.from_array(arr=arr, delta=100.0) # delta is optional and defaults to 300.0
print(tdigest.means, tdigest.weights)
# Create directly from means and weights arrays
vals = np.random.randn(1000).astype(np.float32)
weights = np.ones(1000).astype(np.uint32)
tdigest = TDigest.from_means_weights(arr=vals, weights=weights)
# Compute a quantile
tdigest.quantile(0.1)
# Compute median
tdigest.median()
# Compute trimmed mean
tdigest.trimmed_mean(lower=0.05, upper=0.95)
arr1 = np.random.randn(1000)
arr2 = np.ones(1000)
digest1 = TDigest.from_array(arr=arr1)
digest2 = TDigest.from_array(arr=arr2)
merged_digest = digest1.merge(digest2, delta=100.0) # delta again defaults to 300.0
The TDigest
object can be converted to a dictionary and JSON-serialised and is also pickleable.
# Convert and load to/from a python dict
d = tdigest.to_dict()
loaded_digest = TDigest.from_dict(d)
# Pickle a digest
import pickle
pickle.dumps(tdigest)
pip install hatch
cd bindings/python
# Run linters
hatch run dev:lint
# Run tests
hatch run dev:test
# Run benchmark
hatch run dev:benchmark
# Format code
hatch run dev:format
Please read our contributing guide and code of conduct if you'd like to contribute to the project.
Please read our code of conduct before participating in or contributing to this project.
Please see our security policy for details on reporting security vulnerabilities.
TDigest-rs is licensed under the Apache Software License 2.0 (Apache-2.0)