NorskRegnesentral / skweak

skweak: A software toolkit for weak supervision applied to NLP tasks
MIT License
918 stars 71 forks source link

LF Analysis Tooling (Token-Level) #15

Closed schopra8 closed 3 years ago

schopra8 commented 3 years ago

Introduce LF Analysis Functionality

Here, we introduce LF Analysis functionality (similar to what's found in Snorkel) to assess the strength of users' learning functions. Note that the analysis tooling works at a token-level (e.g., accuracy computed at individual token level rather than entire spans); but can be extended in the future to assess learning functions ability to capture entire spans.

Specifically, we implement and test the following functions:

LF analyses can be done in 2 modes according to the strict_match parameter. If strict_match = True, then LFAnalysis will conduct its analyses against BIOLU labels (e.g., B-PERSON, I-PERSON, etc.). If strict_match = False, the LFAnalysis will conduct its analyzes on labels without prefixes (e.g., PERSON, NORP, etc.)

Each lf-specific function (e.g., lf_coverages) can be run in 2 modes according to the agg parameter. If agg = True, the function aggregates results across individual labels. If agg = False, the function is run at the LF + Label level (e.g. , lf_empirical_scores(agg=False) returns individual precision, recall, and F1 scores for each target label for each LF).

plison commented 3 years ago

Brilliant, thanks a lot!