ClimateMisinformation / Scrapers

Web scrapers
5 stars 1 forks source link

Compare Snorkel to Doccano #34

Open ebrucucen opened 3 years ago

ebrucucen commented 3 years ago

We want to use the best tool for labelling, and we have issues with the Doccano. Alternative : https://github.com/snorkel-team/snorkel

ninjalu commented 3 years ago

I have looked into Snorkel.

It is essentially a generalisation labelling tool (a bit like random forest, except the prediction is how Snorkel will label your data). So if we have some rules (doesn't have to be quite so exact, e.g. if an article quotes a certain fake news influencer, then this is misinformation. It does not have to be exact. It could be an article debunking this influencer). These rules can be written into functions of simple, crude classifiers. Snorkel will learn from these functions how to label your data.

My take is that if we can write a lot of rule based functions to crudely classify the training data, Snorkel could be very useful. However, I do wonder if/how we can gain much time advantage in doing so, because whoever writes those functions needs to really study the training data well.