New language - Githubissues

dimitrismistriotis / alt-profanity-check

A fast, robust library to check for offensive language in strings, dropdown replacement of "profanity-check".

https://pypi.org/project/alt-profanity-check/

MIT License

69 stars 16 forks source link

New language #19

Closed drispal closed 1 year ago

drispal commented 1 year ago

Hi, I wonder how I can add a new language to this package ? It would be really helpful for my project !

Thanks,

dimitrismistriotis commented 1 year ago

Extremely tough: One idea would be to translate all the content and then train the model. Ptoblem with this: tons of work with not a certain result.

I would try to find a corpus in the language I am interested in and train in that creating a new model. Then use corresponding to language model.

A quick solution would be to use a translator to English before check but this was the issue that profanity does not translate directly.

menkotoglou commented 1 year ago

As @dimitrismistriotis said, this is extremely difficult with the way the project is constructed.

What he proposes above is the most straightforward solution for this. Unfortunately, nobody would guarantee that we would get a good result out of it.

Closing for now; Open to discussing another issue or reviewing any PR that would help resolve the issue.