OpenMined / PipelineDP

PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
https://pipelinedp.io/
Apache License 2.0
272 stars 77 forks source link

Speedup of utility combiners #388

Closed dvadym closed 1 year ago

dvadym commented 1 year ago

This PR speeds up utility analysis per partition for multi set of input configurations by vectorizing some operations with Numpy,

namely before this PR Each combiner gets List[(count:int, sum:float, n_partitions:int)] and all processing is in Python

Now each combiner gets

  1. (counts:np.ndarray, sums:np.ndarray, n_partitions:np.ndarray) and all processing is in Numpy, which is much faster.