OpenMined / PipelineDP

PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
https://pipelinedp.io/
Apache License 2.0
275 stars 77 forks source link

Refactoring: Using tuples instead of dataclasses for Utility Analysis accumulators. #350

Closed dvadym closed 2 years ago

dvadym commented 2 years ago

This is done for performance reasons - it's much faster to work with tuples and serialized size of tuples is ~10 times smaller.

It turned out that the changes are pretty small and the resulting code structuring is nice:

  1. Before:
    
    def create_accumulators():
    ...
    return Accumulator(...)
After

def create_accumulators(): ... return (...) # the same code, just Accumulator name dropped


2. Before

def compute_metrics(acc)

using acc.field1, acc.field2 etc

After

def compute_metrics(acc) field1, field2.. = acc

replace acc.field1 -> field1 ...



Also, it's introduced a base class for UtilityAnalysisCombiners.