This repository provides the official implementation of the following paper:
Mitigating Source Bias for Fairer Weak Supervision (NeurIPS 2023)
Changho Shin, Sonia Cromp, Dyah Adila, Frederic Sala
We recommend you create a conda environment as follows.
conda env create -f environment.yml
and activate it with
conda activate fair-ws
Preprocessed datasets and cached SBM mappings can be downloaded here. Dataset folders should be located under data
folder, i.e.
fair-ws/ data/ adult/ bank_marketing/ CelebA/ census/ CivilComments hateXplain imdb tennis UTKFace fairws notebook
It can be done by the following command lines. It may take ~30 minutes (6.24GB).
cd fair-ws
mkdir data
cd data
wget https://www.dropbox.com/sh/7hwqrigdn5oehn7/AACBq7q-eKuOkKQi5nxM01IKa -O datasets.zip
unzip datasets.zip
from fairws.data_util import load_dataset
from fairws.sbm import find_sbm_mapping, correct_bias
# load data and LF; a_train and a_test are group variables
x_train, y_train, a_train, x_test, y_test, a_test = load_dataset(...)
L = load_LF(...))
# Source bias mitigation (SBM)
ot_type = "linear"
sbm_diff_threshold = 0.05
sbm_mapping = find_sbm_mapping(x_train, a_train, ot_type)
L = correct_bias(L, a_train, sbm_mapping, sbm_diff_threshold)
# Standard weak supervision pipeline
label_model = LabelModel(...)
label_model.fit(L_train=L, ...)
y_train_pseudo = label_model.predict(L, ...)
If you find our repository useful for your research, please consider citing our paper:
@article{shin2023mitigating,
title={Mitigating Source Bias for Fairer Weak Supervision},
author={Shin, Changho and Cromp, Sonia and Adila, Dyah and Sala, Frederic},
journal={arXiv preprint arXiv:2303.17713},
year={2023}
}