concatenated profane words make "false positive" prediction

dimitrismistriotis / alt-profanity-check

A fast, robust library to check for offensive language in strings, dropdown replacement of "profanity-check".

https://pypi.org/project/alt-profanity-check/

MIT License

68 stars 15 forks source link

concatenated profane words make "false positive" prediction #53

Open ciapecki opened 1 week ago

ciapecki commented 1 week ago

predict_prob(['fuck','shit','fuckshit']) #[1. 0.99999982 0.03636672]

Is there a possibility to treat the last element of array as profane?

dimitrismistriotis commented 1 week ago

Thanks for the issue.

We had similar discussions in the past including for when the code was "living" in Gitlab, nice to have it here for reference.

In order to do so we should update the dataset with more sentences having fuckshit annotated as profane. Currently we are using the original dataset and have not discussed updating it, although I am open to the possibility if one can appoint a good corpus.