semi-supervised noisy label augmentation - Githubissues

dmarx / SuicideWatch

Experiment to build a model that detects suicidality in social media activity.

1 stars 1 forks source link

semi-supervised noisy label augmentation #8

Open dmarx opened 6 years ago

dmarx commented 6 years ago

PU Learning: positive + unlabeled data

https://www.cs.uic.edu/~liub/NSF/PSC-IIS-0307239.html

PU LEarning: cosine-rocchio PU algorithm, Li et al 2010 EMNLP

Use cosine similarity and KNN (rochio classification) to filter likely false negatives from negative class to construct a "reliable negatives" dataset. Train SVM to further filter data. rinse and repeat

Perplexity based PU learning (stephen wan)

use perplexity to find |pos_class| most dissimilar unlabeled instances to positive class
train classifier to filter

dmarx commented 6 years ago

relevance models - ~2001/2002

LM ranking, top N -> another model to extract salient terms -> add to query, repeat
use KL divergence instead of perplexity