Spam handler - Githubissues

2Cubed commented 8 years ago

Rewrite of v0.3's spam handling, put into a Handler.

[x] URL Detection
[x] Capitalization Detection
[x] Emote Detection
[x] Length Detection
Features:

URL Detection

We can use Beam's url type. It's generated using an advanced NodeJS library, with the sole purpose of recognizing links. Should catch everything.

It might also be cool to add toggling of specific domains - for example, allow all youtube.com links. We could do this fairly easily using Python's urllib parsers.

Capitalization Detection

We should use a better formula than "total caps greater than x" this time. Maybe check if the ratio of capitals to total (or better yet, capitals to lowercase?) is above a specific threshold? We should keep some total length threshold, though, so HI! isn't considered spam ("100% caps").

Emote Detection

We should do something similar to the method described in Capitalization Detection, where it compares the emotes to ratios, as well. We should also consider forcing it to be more lenient with sub emotes, so that a raid which uses a bunch of the raider's emotes doesn't get removed. (It's happened.)

Length Detection

We should remove this. It's confusing to have a different length limit than Beam's 360 character one.

Innectic commented 8 years ago

(Except I'll make it not crap this time)

2Cubed commented 8 years ago

@Innectic Sounds great! Just edited the original message, including much more detail. There's a checklist at the top - check things off as they're finished? :smile:

Innectic commented 8 years ago

This was finished on the feature/handler branch.

2Cubed commented 8 years ago

Reopening, until everything is merged. (We should pull request the change into develop and merge, referencing the pullreq in the close message here afterwards.)

2Cubed commented 8 years ago

Current state of packet parsing:

    def on_message(self, packet):
        """Handle message events."""
        packet = json.loads(packet)
        # exceeds_caps = self.check_caps(''.join(chunk for chunk in packet if chunk["type"] == "text"))
        contains_emotes = self.check_emotes(packet)
        has_links = self.check_links(packet)

        if contains_emotes or has_links:
            return True
        else:
            return False

Still need to:

[x] Remove json.loads (the packet isn't a JSON string)
[x] Fix exceeds_caps
[x] Figure out return values (True and False aren't descriptive of what needs to happen)

Innectic commented 8 years ago

@2Cubed Maybe we could return an object with the action attribute and check what it is?

IE:

packet = {
    "action": "timeout"
}

action could also be ban, purge, or anything like that

RPiAwesomeness commented 8 years ago

Maybe make it return a function or coroutine? My other thought was maybe we should have the spam system check if the words are actual words vs random letters.

"ASDADASFDSLhjldfsLKSDOFIJ!O!!" - obviously spam, no actual words "WOW! SUB HYPE! Super excited for the rest of this STREEEEEAAAAMM!" - less likely to be spam

Innectic commented 8 years ago

@RPiAwesomeness Hmm, not sure if we can really check between those two

Innectic commented 8 years ago

Fixed the json thing

Innectic commented 8 years ago

Fixed the capital checker

2Cubed commented 8 years ago

@Innectic Hmm, not a bad idea! Maybe something like this?

[
  {
    "action": "message",
    "data": [{
      "type": "text",
      "text": "Please do not spam emotes.",
      "data": "Please do not spam emotes."
    }]
  },
  {
    "action": "timeout",
    "data": {
      "user": "Potato",
      "time": 60
    }
  }
]

It might get a bit bulky at times, but it could be perfect what we need. :smile:

2Cubed commented 8 years ago

@RPiAwesomeness Interesting idea. However, I don't think it's really feasible, especially considering the amount of work required just to get it to understand basic concepts.

SUB HYPE! WOOT! contains no "real words" (except for SUB, arguably). It would be really hard to catch all of these types of phrases/expressions.
We have to consider things like WOW YOU ARE SUCH A POTATO, too. While all of those are real words, they're most likely spam.

Innectic commented 8 years ago

@2Cubed I like that. :+1:

kz5ee commented 8 years ago

I would suggest using spaces to check for "words". You can't realistically catch everything, however, finding spaces in a string reduces the likelihood of it being SPAM.

I would say that a phrase such as WOW YOU ARE SUCH A POTATO being SPAM is highly dependent on the context and is not as likely to be SPAM. That said, if someone posts it 'n+1' times, then it is highly likely to be spam and posts to 'n' should be removed from chat and the poster be timed out or banned.

2Cubed commented 8 years ago

@kz5ee Interesting idea. This would be really hard to implement with our current system, though - we'd have to account for previously-sent messages, and our handler system only (easily) handles one at a time. Definitely something to consider for the future, but currently, we have to focus on getting v0.4 out. :smile:

kz5ee commented 8 years ago

Good thing we have chat moderators, eh? It could be as simple as keeping the last message a user posted and if t seconds have passed it isn't considered SPAM anymore.

2Cubed commented 8 years ago

@kz5ee Hehe, yeah. Unfortunately, it wouldn't be as simple as it seems - "letting go" of messages after t seconds would actually be pretty complex.

kz5ee commented 8 years ago

Simple as in mechanics, not necessarily in implementation.

2Cubed commented 8 years ago

@kz5ee Precisely.

2Cubed commented 8 years ago

Mostly complete, as of a2a3e6f5b43d018b3fac1c93efb681c291aa40b8.

Still need to

decide on a standard role level map
implement TimeoutPacket
add whisper/DM attribute (in **meta?) to MessagePacket
add support for multiple returned packets (#43)

2Cubed commented 8 years ago

Almost done!

BanPacket implemented in a28ab0e43b2597a0ce3920d46d5331d6b8c03cec
Targeted MessagePackets ("whispers") implemented in 8a0a25372ab465b5f026b0b34d933d4d9fe1f20b
Multiple return Packet support implemented in 8ef5b15c62960fb9abc43c9b30550faa0c0d7227

Need to:

[ ] Make maximums configurable, using the API and Sepal (cc: @Innectic, @RPiAwesomeness)
[ ] Decide on a standard role level map ( #54 )

2Cubed commented 8 years ago

Closing this. We have #59 and #54 as separate issues. For all intensive purposes, the logic of the handler is done.

CactusDev / CactusBot

Spam handler #47

Features:

URL Detection

Capitalization Detection

Emote Detection

Length Detection