Deactivate weak supervision labels below threshold

Is your feature request related to a problem? Please describe. We've just implemented the confidence distribution chart, which shows you how your weak supervision confidence is distributed amongst the records. Now, what I want to enable, is to deactivate weakly supervised labels based on user input in form of a threshold. E.g., if an entity has a confidence of only 20%, I don't want to use it in NER.

Describe the solution you'd like Not sure if this is a threshold we should inject during computation (e.g. when you execute the weak supervision), or whether it should be something you can regulate afterwards.

The latter one leaves more room for playing around with scores, whereas the one throwing away weakly supervised labels during computation based on a threshold is much easier to implement for now.

Describe alternatives you've considered Stating that this is not an alternative: just filtering in the data browser. Even though this works for classification cases, it isn't really an option for NER. In NER, you can have multiple entities per text, and thus you can have one super confident entity and one with ~10%-ish confidence.

Additional context In general, I very much believe that we should enable users to select their own choice for weak supervision synthesis. For instance, in NER I might want to rather exclude overlapping spans (i.e. find the intersection of heuristic labels), and in other cases of span labeling I want to find as much overlap as possible. If we provide an execution environment for weak supervision, we could easily integrate the option to filter below certain thresholds.

Could be combined with issue #57 .

code-kern-ai / refinery

Deactivate weak supervision labels below threshold #98