Closed dmartin closed 2 years ago
Hi there,
Thank you for this list of false negatives! I've determined all of them (or rather the root profanity in each) belong on the profanity list in the (now released) version 0.3.12
:+1:
We are using rustrict version 0.3.11 (via .is_inappropriate()) as part of our profanity filtering in picoCTF.
That's really cool! The is_inappropriate()
API should now catch all of these, except hator
which I classified as Type::MODERATE & Type::MEAN
, which currently is not considered Type::INAPPROPRIATE
. You may use is(Type::INAPPROPRIATE | Type::MEAN & Type::MODERATE_OR_HIGHER)
instead of is_inappropriate()
if you wish to flag this and other similar cases.
Thank you for creating this library! Even with the false negatives listed here, we still achieved about 90% effectiveness out of the box for our sample dataset.
You're welcome! I'm really glad it is helpful! :smiley:
I use a Wikipedia comment dataset to keep track of accuracy, and this change corresponds with a 0.08% improvement in the positive accuracy :tada:, while negative accuracy decreases by 0.06% (although, in many of these cases, the filter is right in my opinion).
In practice, we use the customization feature to work around most of these false negatives, but I thought that some may be of interest upstream.
Thanks for the feedback. The customization feature, which is one of the main things preventing this crate from being stable, needs a bit more work (removing unsafe
without adding runtime overhead). But since everyone seems to need it, I'll prioritize it more :ok_hand:
Awesome! Thanks again.
False Positives
The following shouldn't have been detected, but was:
False Negatives
The following should have been detected, but wasn't:
Context
We are using
rustrict
version0.3.11
(via.is_inappropriate()
) as part of our profanity filtering in picoCTF.Thank you for creating this library! Even with the false negatives listed here, we still achieved about 90% effectiveness out of the box for our sample dataset.
In practice, we use the customization feature to work around most of these false negatives, but I thought that some may be of interest upstream.