This is a fast and robust library that detects offensive language within text strings. It currently supports only English language, more languages will be added soon.
This library uses a logistic regression ML.NET model trained on thousands of human-labeled words. The trained model then was loaded as a resource for this lib and it is consulted on every new prediction.
Up to this moment all .NET profanity detection libraries use hard-coded lists of bad words to detect profanity, for instance, ProfanityDetector uses this list stored in memory, there are obvious glaring issues with this approach, and while they might be performant, these list based libraries are not comprehensive, they are easily outperformed by misspelling and by the human creativity to replace letters for meaningless chars creating new words that are perceived as curse words (e.g. house and h0us3).
In a single prediction this library was 618 times faster than the most downloaded .NET package for detecting profanity. For 100 successive predictions it was around 24 times faster.
Package | 1 Prediction | 10 Predicitons | 100 predictions |
---|---|---|---|
.Net Bad Word Detector | 0.0462 ms | 1.5508 ms | 4.1887 ms |
ProfanityDetector | 28.5823 ms | 42.4606 ms | 102.0750 ms |
PC specs: Dell Inspiron 13, I7 8th gen, 16 GB.
dotnet add package DotnetBadWordDetector
var detector = new ProfanityDetector();
if(detector.IsProfane("foo bar")){
//do something
}
It is strongly suggested to keep the library always loaded in memory to increase its performance, it uses very little memory (less than 100 KB).
Model quality metrics evaluation
--------------------------------
Accuracy: 98.43%
Auc: 99.49%
F1Score: 97.25%
This library is not perfect, it is not 100% precise, and it is context-free, e.g. it can not detect profane phrases consisted of decent words. Also people diverge on what is considered profane.