UPDATE 06/17/20: Code re-factored, with two new features:
label_model.fit(..., solve_method='triplet_mean')
).
By default, the code now uses triplet_mean
.P(lambda_i == Y)
with
label_model.estimated_accuracies()
.FlyingSquid is a new framework for automatically building models from multiple noisy label sources. Users write functions that generate noisy labels for data, and FlyingSquid uses the agreements and disagreements between them to learn a label model of how accurate the labeling functions are. The label model can be used directly for downstream applications, or it can be used to train a powerful end model:
FlyingSquid can be used to build models for all sorts of tasks, including text applications, video analysis, and online learning. Check out our blog post and paper on arXiv for more details!
from flyingsquid.label_model import LabelModel
import numpy as np
L_train = np.load('...')
m = L_train.shape[1]
label_model = LabelModel(m)
label_model.fit(L_train)
preds = label_model.predict(L_train)
We recommend using conda
to install FlyingSquid:
git clone https://github.com/HazyResearch/flyingsquid.git
cd flyingsquid
conda env create -f environment.yml
conda activate flyingsquid
Alternatively, you can install the dependencies yourself:
And then install the actual package:
pip install flyingsquid
To install from source:
git clone https://github.com/HazyResearch/flyingsquid.git
cd flyingsquid
conda env create -f environment.yml
conda activate flyingsquid
pip install -e .
If you use our work or found it useful, please cite our paper at ICML 2020:
@inproceedings{fu2020fast,
author = {Daniel Y. Fu and Mayee F. Chen and Frederic Sala and Sarah M. Hooper and Kayvon Fatahalian and Christopher R\'e},
title = {Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods},
booktitle = {Proceedings of the 37th International Conference on Machine Learning (ICML 2020)},
year = {2020},
}