finnbear / rustrict

rustrict is a profanity filter for Rust
https://crates.io/crates/rustrict
MIT License
92 stars 10 forks source link

Filtering error (false positive and/or false negative) #28

Closed vtvz closed 1 month ago

vtvz commented 1 month ago

False Positives

The following shouldn't have been detected, but was:

"'Cause there's a distance now" "'Cause there'********ance now" 🟡 mildly sexual
"(It's alright, it's alright, it's alright to start from the bottom)" "(It's alrigh******* alrigh******* alright to start from the bottom)" 🟠 moderately sexual, mildly evasive
"And I'll do my duty" "An********* my duty" ⛔️ severely sexual
"Caught in a loop where my mind's expired" "Caught in a loop where my mind'****pired" 🟠 moderately sexual
"I wish I take it back to me" "I wi******ake it back to me" 🟠 moderately profane
"I'm wishing you could see me now" "I'm wishing you could ********ow" ⛔️ severely sexual
"It's gonna hurt, it's gonna hurt! Oh" "It's gonna hur******* gonna hurt! Oh" 🟠 moderately sexual, mildly evasive
"Was it meant to last? It fell apart so fast" "Was it meant to las***** fell apart so fast" 🟠 moderately sexual, mildly evasive
"You don't get it, it's my life, yeah" "You don't get i******* my life, yeah" 🟠 moderately sexual, mildly evasive

Lines formatted with:

println!("{:?} {:?} {}", line.line, line.censored, line.typ);

I have a LOT of cases with space-separated words. Feels like I need to check if there are any censored spaces within line or split words, check them one by one and then merge them back together. Do you have any suggestions for workaround these types of issues?

finnbear commented 1 month ago

Thanks, I've fixed the examples provided! For song lyrics, which don't usually contain evasive profanity, you could try the word-by-word approach. Also consider replacing all the punctuation with spaces.

(I have some other reports to process before I release a new version)