Here, we introduce LF Analysis functionality (similar to what's found in Snorkel) to assess the strength of users' learning functions. Note that the analysis tooling works at a token-level (e.g., accuracy computed at individual token level rather than entire spans); but can be extended in the future to assess learning functions ability to capture entire spans.
Specifically, we implement and test the following functions:
label_overlap: For each label, compute the fraction of tokens with at least 2 LFs providing a non-null annotation.
label_conflict: For each label, compute the fraction of tokens with conflicting non-null labels.
lf_target_labels: Infer the target labels of each LF based on evidence in the label matrix. Excludes null token label.
lf_coverages: Compute LF coverages (i.e., tokens labeled by a LF that are also labeled by another LF).
lf_overlaps: Compute LF overlaps (i.e., tokens labeled by 2+ LFs).
lf_conflicts: Compute LF conflicts (i.e., instances where 2 LFs assign different non-null labels to a token).
lf_empirical_accuracies: Compute empirical accuracies, setting any out-of-domain labels from the ground truth dataset to null (0) when assessing the accuracy of any individual LF.
lf_empirical_scores: Compute precision, and recall for the LFs.
LF analyses can be done in 2 modes according to the strict_match parameter. If strict_match = True, then LFAnalysis will conduct its analyses against BIOLU labels (e.g., B-PERSON, I-PERSON, etc.). If strict_match = False, the LFAnalysis will conduct its analyzes on labels without prefixes (e.g., PERSON, NORP, etc.)
Each lf-specific function (e.g., lf_coverages) can be run in 2 modes according to the agg parameter. If agg = True, the function aggregates results across individual labels. If agg = False, the function is run at the LF + Label level (e.g. , lf_empirical_scores(agg=False) returns individual precision, recall, and F1 scores for each target label for each LF).
Introduce LF Analysis Functionality
Here, we introduce LF Analysis functionality (similar to what's found in Snorkel) to assess the strength of users' learning functions. Note that the analysis tooling works at a token-level (e.g., accuracy computed at individual token level rather than entire spans); but can be extended in the future to assess learning functions ability to capture entire spans.
Specifically, we implement and test the following functions:
label_overlap
: For each label, compute the fraction of tokens with at least 2 LFs providing a non-null annotation.label_conflict
: For each label, compute the fraction of tokens with conflicting non-null labels.lf_target_labels
: Infer the target labels of each LF based on evidence in the label matrix. Excludes null token label.lf_coverages
: Compute LF coverages (i.e., tokens labeled by a LF that are also labeled by another LF).lf_overlaps
: Compute LF overlaps (i.e., tokens labeled by 2+ LFs).lf_conflicts
: Compute LF conflicts (i.e., instances where 2 LFs assign different non-null labels to a token).lf_empirical_accuracies
: Compute empirical accuracies, setting any out-of-domain labels from the ground truth dataset to null (0) when assessing the accuracy of any individual LF.lf_empirical_scores
: Compute precision, and recall for the LFs.LF analyses can be done in 2 modes according to the
strict_match
parameter. Ifstrict_match = True
, then LFAnalysis will conduct its analyses against BIOLU labels (e.g., B-PERSON, I-PERSON, etc.). Ifstrict_match = False
, the LFAnalysis will conduct its analyzes on labels without prefixes (e.g., PERSON, NORP, etc.)Each lf-specific function (e.g.,
lf_coverages
) can be run in 2 modes according to theagg
parameter. Ifagg = True
, the function aggregates results across individual labels. Ifagg = False
, the function is run at the LF + Label level (e.g. ,lf_empirical_scores(agg=False)
returns individual precision, recall, and F1 scores for each target label for each LF).