A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
I am already using BinaryLabelDataset for generating fairness metrics and it works rather fine with average size dataframes. Now, due to some preprocessing steps in one of my pipelines, I need much more memory and need to support large csv files (e.g. 10GB+) and switched to using pyspark.
My question is: does BinaryLabelDataset also work with pyspark dataframe or I need to convert pyspark dataframe it to pandas dataframe (and basically kind of loosing the distributed property of pyspark by doing this and still risking of memory overflow)?
Hi,
I am already using BinaryLabelDataset for generating fairness metrics and it works rather fine with average size dataframes. Now, due to some preprocessing steps in one of my pipelines, I need much more memory and need to support large csv files (e.g. 10GB+) and switched to using pyspark.
My question is: does BinaryLabelDataset also work with pyspark dataframe or I need to convert pyspark dataframe it to pandas dataframe (and basically kind of loosing the distributed property of pyspark by doing this and still risking of memory overflow)?
Thanks in advance