How can I use some manually annotated data as validation data to guide the aggregation step?

NorskRegnesentral / skweak

skweak: A software toolkit for weak supervision applied to NLP tasks

MIT License

918 stars 73 forks source link

How can I use some manually annotated data as validation data to guide the aggregation step? #14

Closed sujoysarkarcs closed 3 years ago

sujoysarkarcs commented 3 years ago

Hi, can the aggregation step use some validation data like Snorkel?

plison commented 3 years ago

Do you mean using validation data to help the algorithm find good parameter values for the aggregation model?

I think the easiest would be to define the gold standard labels from the validation data as one more labelling function, and then run the aggregation with a very high initial weight for that labelling function (see the argument initial_weights when initialising the aggregator).