dimitrismistriotis / alt-profanity-check

A fast, robust library to check for offensive language in strings, dropdown replacement of "profanity-check".
https://pypi.org/project/alt-profanity-check/
MIT License
69 stars 16 forks source link

concatenated profane words make "false positive" prediction #53

Open ciapecki opened 1 month ago

ciapecki commented 1 month ago

predict_prob(['fuck','shit','fuckshit']) #[1. 0.99999982 0.03636672]

Is there a possibility to treat the last element of array as profane?

dimitrismistriotis commented 1 month ago

Thanks for the issue.

We had similar discussions in the past including for when the code was "living" in Gitlab, nice to have it here for reference.

In order to do so we should update the dataset with more sentences having fuckshit annotated as profane. Currently we are using the original dataset and have not discussed updating it, although I am open to the possibility if one can appoint a good corpus.