OpenMined / PipelineDP

PipelineDP is a Python framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
https://pipelinedp.io/
Apache License 2.0
274 stars 77 forks source link

Spark Dataframe API #495

Closed dvadym closed 11 months ago

dvadym commented 11 months ago

This PR implements running DataFrame queries for Spark DataFrame. It contains

  1. SparkConverter for converting between RDD and DataFrame.
  2. Query.run_query() implementation, which maps Query to DPEngine.aggregate call