ferryjul / fairCORELS

Algorithm for learning fair rule lists
GNU General Public License v3.0
9 stars 3 forks source link

Faircorels

Welcome to FairCorels, a Python library for learning fair and interpretable models. The use of Python 3 is strongly recommended ! Feel free to point out any issue you may encounter while using our package, or to recommend new features!

Email contact: jferry@laas.fr

Note that we released a new version of this module, named FairCORELSV2, integrating advanced pruning techniques to efficiently explore the search space of fair rule lists.

References

This repository contains the implementation of the method introduced in the paper Learning fair rule lists:

[1] Ulrich Aïvodji, Julien Ferry, Sébastien Gambs, Marie-José Huguet, and Mohamed Siala. 2019. "Learning fair rule lists." arXiv preprint arXiv:1909.03977.

We presented our package in a Demo paper FairCORELS, an Open-Source Library for Learning Fair Rule Lists at the 30th ACM International Conference on Information & Knowledge Management (CIKM'21):

[2] Ulrich Aïvodji, Julien Ferry, Sébastien Gambs, Marie-José Huguet, and Mohamed Siala. 2021. FairCORELS, an Open-Source Library for Learning Fair Rule Lists. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM '21). Association for Computing Machinery, New York, NY, USA, 4665–4669. DOI:https://doi.org/10.1145/3459637.3481965

Overview

FairCORELS is a bi-objective extension of the CORELS algorithm, handling jointly accuracy and fairness. The main classifier object included in our module is FairCorelsClassifier. It implements the fairCORELS method. The currently supported fairness notions are : statistical parity, predictive parity, predictive equality, equal opportunity, equalized odds, and conditional use accuracy equality (see Table 1 of [2] for details of the computations). However, the core algorithm is metric-agnostic and any function of the confusion matrix of a classifier could be integrated.

Our module also includes a wrapper for ensemble learning: FairCorelsBagging. The FairCorelsBagging class provides a wrapper to perform the Bagging ensemble method using FairCorelsClassifier as a base learner. Note that FairCorelsBagging is not maintained.

Examples

Basic example

from faircorels import *

# Load the dataset
X, y, features, prediction = load_from_csv("data/compas_rules_full.csv")

# Define protected and unprotected groups
# Here, we want them to correspond to features 0 and 1 (which we display)
# However, they can be any binary vector
sensitive_attr_column = 0
unsensitive_attr_column = 1
print("Sensitive attribute is ", features[sensitive_attr_column])
print("Unsensitive attribute is ", features[unsensitive_attr_column])
sensVect =  X[:,sensitive_attr_column]
unSensVect =  X[:,unsensitive_attr_column] 

# Define the desired fairness level (which is exactly (1.0 minus the unfairness tolerance) - see Table 1 of [2] for details of the computation)
epsilon = 0.98 # max. unfairness tolerance of 0.02 (fairness level of 98%)

# Create the model, with 10000 as the maximum number of iterations 
c = FairCorelsClassifier(n_iter=1000000, # maximum number of nodes in the prefix tree
                        c=0.001, # regularization parameter for sparsity
                        max_card=1, # each antecedent will have cardinality one (recommended if rule mining is done as preprocessing)
                        min_support = 0.01, # each rule antecedent must capture at least 1% of the training instances
                        policy="bfs", # exploration heuristic
                        bfs_mode=2, # exploration heuristic
                        mode=3, # epsilon-constrained mode
                        fairness=1, # statistical fairness metric to be used, 1 stands for statistical parity
                        epsilon=epsilon,  # epsilon is the unfairness tolerance
                        maj_vect=unSensVect, # unSensVect is a binary vector indicating unprotected group membership for all examples of X
                        min_vect=sensVect # unSensVect is a binary vector indicating protected group membership for all examples of X
                        )

# Fit the classifier
c.fit(X, y, features=features, prediction_name=prediction)

# Score the model on the training set
a = c.score(X, y)

# Compute its unfairness
cm = ConfusionMatrix(sensVect, unSensVect, c.predict(X), y)
cm_minority, cm_majority = cm.get_matrix()
fm = Metric(cm_minority, cm_majority)
unf = fm.statistical_parity()

# Print the model's performances
print("Training accuracy = %f, training unfairness = %f" %(a, unf))

# Print the model itself
print(c.rl_)

Complete examples

A step-by-step example notebook Demo-fairCORELS.ipynb can be found under the example folder.

Detailed example files, using 5-folds cross-validation for the COMPAS dataset, are also provided in the example directory :

All files show how to load data, how to train our classifiers, how to evaluate them, and how to store results in a clear and easily exploitable manner.

Installation

Ubuntu

sudo apt install libgmp-dev
pip install faircorels

Note that running the provided example scripts after installing faircorels might raise errors about Numpy versions. In this case, simply uninstall the lastly installed Numpy (pip uninstall numpy) and the issue should be solved.

Mac

# Install g++ and gmp
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install g++ gmp

pip install faircorels

Windows

Note: Python 2 is currently NOT supported on Windows.

pip install faircorels

Detail of the classifiers' parameters :

FairCorelsClassifier :

Constructor arguments :

Methods :

.fit(X, y, features=[], prediction_name="prediction", performRestarts=0, initNBNodes=1000, geomRReason=1.5, max_evals=1000000000, time_limit = None):

Method for training the classifier.

.predict(X):

Method for predicting using the trained classifier.

=> Returns : p : array of shape = [n_samples] -> The classifications of the input samples.

.predict_with_scores(X):

Method for predicting using the trained classifier.

=> Returns : p : array of shape = [[n_samples],[n_samples]]. The first array contains the classifications of the input samples. The second array contains the associated confidence scores.

.score(X, y):

Method that scores the algorithm on the input samples X with the labels y. Alternatively, score the predictions X against the labels y (where X has been generated by predict or something similar).

=> Returns : a : float The accuracy, from 0.0 to 1.0, of the rulelist predictions

.get_params():

Method to get a list of all the model's parameters.

=> Returns : params : dict Dictionary of all parameters, with the names of the parameters as the keys

.set_params(params):

Method to set some of the model's parameters.

.save(fname):

Method to save the model to a file, using python's pickle module.

.load(fname):

Method to load a model from a file, using python's pickle module.

.rl(set_val=None):

Method to return or set the learned rulelist

=> Returns : rl : obj The model's rulelist

.str():

Method to get a string representation of the rule list

=> Returns : rl : str The rule list

.repr():

Same behavior as the previous one.

.explain(anEx):

Method to explain a prediction (by providing the matching rule).

=> Returns : list l where l[0] is the instance's prediction l[1] is the implicant(s) that led to that decision (both are strings - user friendly)

.explain_api(anEx):

Method to explain a prediction (by providing the matching rule) (shorter output).

=> Returns : list l where l[0] is the instance's prediction l[1] is the implicant(s) that led to that decision (both are API-oriented - easy to use by a program)

.explain_long(anEx):

Method to explain a prediction (by providing the matching rule and all the previous unmatched implicants).

=> Returns : list l where l[0] is the instance's prediction l[1] is the implicant(s) that led to that decision (both are strings - user friendly)

.explain_long_api(anEx):

Method to explain a prediction (by providing the matching rule and all the previous unmatched implicants) (shorter output).

=> Returns : list l where l[0] is the instance's prediction l[1] is the implicant(s) that led to that decision (both are API-oriented - easy to use by a program)

FairCorelsBagging :

This class provides an easy wrapper for the Bagging method that uses the FairCorelsClassifier class as underlying base learning. Hence, arguments directly passed to the FairCorelsClassifier object will not be detailed again.

Constructor arguments :

The training sets for the different base learners are automatically computed from the entire provided training set, using the provided parameters.

Methods :

.fit(performRestarts=0, initNBNodes=1000, geomRReason=1.5, max_evals=1000000000, time_limit = None, n_workers=-1):

Method to train the base learners.

.predict(X):

Predict classifications of the input samples X. Uses majority vote as aggregation function.

=> Returns : p : array of shape = [n_samples]. The classifications of the input samples.

.score(X, y):

Score the algorithm on the input samples X with the labels y. Alternatively, score the predictions X against the labels y (where X has been generated by predict or something similar).

=> Returns : a : float The accuracy, from 0.0 to 1.0, of the rulelist predictions

.explain(anInst):

Explains a prediction (by its matching rules among majority base learners).

=> Returns : d : dictionnary {'implicants':i, 'prediction':p} where i is the list of implicants that led majority voters to their prediction and p is the associated prediction

.explain_complete(anInst):

Explains a prediction (adds complete implications, including antecedents negation for all learners).

=> Returns : d : dictionnary {'implicants':i, 'prediction':p} where i is the list of implicants (and unmatched rules) that led majority voters to their prediction and p is the associated prediction