finnbear / rustrict

rustrict is a profanity filter for Rust
https://crates.io/crates/rustrict
MIT License
94 stars 10 forks source link

Filtering error (false positive and/or false negative) #2

Closed vtvz closed 2 years ago

vtvz commented 2 years ago

False Positives

The following shouldn't have been detected, but was:

Bridge: Caleb Shomo

Context

I am using rustrict version latest

    let (censored, typ) = rustrict::Censor::from_str("Bridge: Caleb Shomo")
        .with_censor_first_character_threshold(Type::ANY)
        .with_censor_threshold(Type::ANY)
        .censor_and_analyze();

    dbg!(censored);
    println!("{:#b}", typ);

Output:

[src/main.rs:35] censored = "Bridge: Caleb S****"
0b1010000
finnbear commented 2 years ago

Thanks for the report! I've fixed it in 0.3.11 and added a test case.

Note: Rather than simply adding this particular case to the list of false positives (and the other 30+ names ending in homo), I improved the filter by adding a feature that certain words (like homo) can be marked as a prefix-only match, meaning they won't be detected as the suffix of a word. This increases the filter's negative accuracy by 0.01% on the wikipedia dataset :tada:, although making certain other words as such would probably increase it even more.