R2NorthstarTools / Reaper

Open source GitHub repo for the Northstar discord server bot "Reaper"
MIT License
0 stars 1 forks source link

Chat sentiment analyser #13

Closed GeckoEidechse closed 1 month ago

GeckoEidechse commented 1 month ago

This feature would be a bit more involved, basically it would monitor chat sentiment to listen for toxicity and similar behaviour and ping moderators in case it's suspected that there's high toxicity.

From there moderators can then decide whether to take action or not.

Getting this right will be quite difficult and it will never be completely correct. Also in general any natural language processing tool struggles with sarcasm and similar linguistic devices and can probably easily be bypassed as well using tricks such as changing the meaning of words etc.

So in the end this would be more like an additional tool for moderation rather than replacing manual moderation tasks.

And just to point out for anyone that's concerned about privacy and the like with this, I'm not planning to store any message history simply cause of storage limitations and besides Dyno (most common Discord bot that's also used by Northstar) already stores message history and deleted messages, so if privacy is a concern, don't post stuff in public channels and servers.

Bobbyperson commented 1 month ago

From testing, all of these models seem like they could work for relatively little performance cost. Probably we'd scan every 5th message or something and then warn if the majority of the last 10 scanned messages are negative? Unfortunately all of these models completely fail if the message is past a certain length. It also works surprisingly well with tenor links assuming they are tagged correctly. https://huggingface.co/finiteautomata/bertweet-base-sentiment-analysis https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest https://huggingface.co/avichr/heBERT_sentiment_analysis image image

GeckoEidechse commented 1 month ago

From personal observations most messages tend to be short as people on Discord tend to split up longer messages into shorter individual ones so it's most likely not gonna be an issue.

Regarding how many messages to process message count per minute tends to fluctuate with activity. Optimally we would track CPU usage or something and then skip messages if utilisation goes so high that we lag behind. Though that approach might be a bit too complex for a first implementation ^^"

GeckoEidechse commented 1 month ago

Closed by #18