RubixML / ML

A high-level machine learning and deep learning library for the PHP language.
https://rubixml.com
MIT License
2.03k stars 183 forks source link

Semi-supervised Learning Research (SASSC) #152

Closed andrewdalpino closed 3 years ago

andrewdalpino commented 3 years ago

This ticket is for research involving semi-supervised learning or training with a mix of labeled and unlabeled data. This research is motived by the desire to greatly lower the cost of model development wherein particular domains in which labeled data is expensive to obtain such as medicine. Under consideration at this time is a Self-annotating Semi-supervised Classifier (SASSC pronounced "sassy") that decorates a base classifier and incrementally self-annotates the unlabeled portion of the training set until the entire dataset is labeled. Although the name is original (as far as I know), the technique is based on a paper by David Yarowsky on unsupervised word sense disambiguation. The goal is to devise a novel thresholding mechanism to combine with a generalized Yarowsky method to develop an efficient algorithm for label imputation.

If viable, the implementation can be put in the Extras repository for a test period to obtain data on real usage and performance.

andrewdalpino commented 3 years ago

I've completed this research but alas a lack of resources makes this feature infeasible at this time.