CactusDev / CactusBot

An open source, community-written service-agnostic chat bot
MIT License
30 stars 5 forks source link

Spam handler #47

Closed 2Cubed closed 8 years ago

2Cubed commented 8 years ago

Rewrite of v0.3's spam handling, put into a Handler.

We can use Beam's url type. It's generated using an advanced NodeJS library, with the sole purpose of recognizing links. Should catch everything.

It might also be cool to add toggling of specific domains - for example, allow all youtube.com links. We could do this fairly easily using Python's urllib parsers.

Capitalization Detection

We should use a better formula than "total caps greater than x" this time. Maybe check if the ratio of capitals to total (or better yet, capitals to lowercase?) is above a specific threshold? We should keep some total length threshold, though, so HI! isn't considered spam ("100% caps").

Emote Detection

We should do something similar to the method described in Capitalization Detection, where it compares the emotes to ratios, as well. We should also consider forcing it to be more lenient with sub emotes, so that a raid which uses a bunch of the raider's emotes doesn't get removed. (It's happened.)

Length Detection

We should remove this. It's confusing to have a different length limit than Beam's 360 character one.

Innectic commented 8 years ago

(Except I'll make it not crap this time)

2Cubed commented 8 years ago

@Innectic Sounds great! Just edited the original message, including much more detail. There's a checklist at the top - check things off as they're finished? :smile:

Innectic commented 8 years ago

This was finished on the feature/handler branch.

2Cubed commented 8 years ago

Reopening, until everything is merged. (We should pull request the change into develop and merge, referencing the pullreq in the close message here afterwards.)

2Cubed commented 8 years ago

Current state of packet parsing:

    def on_message(self, packet):
        """Handle message events."""
        packet = json.loads(packet)
        # exceeds_caps = self.check_caps(''.join(chunk for chunk in packet if chunk["type"] == "text"))
        contains_emotes = self.check_emotes(packet)
        has_links = self.check_links(packet)

        if contains_emotes or has_links:
            return True
        else:
            return False

Still need to:

Innectic commented 8 years ago

@2Cubed Maybe we could return an object with the action attribute and check what it is?

IE:

packet = {
    "action": "timeout"
}

action could also be ban, purge, or anything like that

RPiAwesomeness commented 8 years ago

Maybe make it return a function or coroutine? My other thought was maybe we should have the spam system check if the words are actual words vs random letters.

"ASDADASFDSLhjldfsLKSDOFIJ!O!!" - obviously spam, no actual words "WOW! SUB HYPE! Super excited for the rest of this STREEEEEAAAAMM!" - less likely to be spam

Innectic commented 8 years ago

@RPiAwesomeness Hmm, not sure if we can really check between those two

Innectic commented 8 years ago

Fixed the json thing

Innectic commented 8 years ago

Fixed the capital checker

2Cubed commented 8 years ago

@Innectic Hmm, not a bad idea! Maybe something like this?

[
  {
    "action": "message",
    "data": [{
      "type": "text",
      "text": "Please do not spam emotes.",
      "data": "Please do not spam emotes."
    }]
  },
  {
    "action": "timeout",
    "data": {
      "user": "Potato",
      "time": 60
    }
  }
]

It might get a bit bulky at times, but it could be perfect what we need. :smile:

2Cubed commented 8 years ago

@RPiAwesomeness Interesting idea. However, I don't think it's really feasible, especially considering the amount of work required just to get it to understand basic concepts.

Innectic commented 8 years ago

@2Cubed I like that. :+1:

kz5ee commented 8 years ago

I would suggest using spaces to check for "words". You can't realistically catch everything, however, finding spaces in a string reduces the likelihood of it being SPAM.

I would say that a phrase such as WOW YOU ARE SUCH A POTATO being SPAM is highly dependent on the context and is not as likely to be SPAM. That said, if someone posts it 'n+1' times, then it is highly likely to be spam and posts to 'n' should be removed from chat and the poster be timed out or banned.

2Cubed commented 8 years ago

@kz5ee Interesting idea. This would be really hard to implement with our current system, though - we'd have to account for previously-sent messages, and our handler system only (easily) handles one at a time. Definitely something to consider for the future, but currently, we have to focus on getting v0.4 out. :smile:

kz5ee commented 8 years ago

Good thing we have chat moderators, eh? It could be as simple as keeping the last message a user posted and if t seconds have passed it isn't considered SPAM anymore.

2Cubed commented 8 years ago

@kz5ee Hehe, yeah. Unfortunately, it wouldn't be as simple as it seems - "letting go" of messages after t seconds would actually be pretty complex.

kz5ee commented 8 years ago

Simple as in mechanics, not necessarily in implementation.

2Cubed commented 8 years ago

@kz5ee Precisely.

2Cubed commented 8 years ago

Mostly complete, as of a2a3e6f5b43d018b3fac1c93efb681c291aa40b8.

Still need to

2Cubed commented 8 years ago

Almost done!

Need to:

2Cubed commented 8 years ago

Closing this. We have #59 and #54 as separate issues. For all intensive purposes, the logic of the handler is done.