georgevanburgh / twitch-chat-scraper

A Twitch chat scraper, written in GoLang.
MIT License
18 stars 1 forks source link

Channel blacklist #2

Open georgevanburgh opened 8 years ago

georgevanburgh commented 8 years ago

Twitch streamers should be able to dismiss scraper from their channels

BillieJackFu commented 8 years ago

I understand programming, shouldn't you be able to add an if/then of on ban /part channel?

georgevanburgh commented 8 years ago

In theory, yes - the scraper can listen for a specific command and add the channel to a blacklist (probably a file so it persists between restarts). Ideally, though, I only want channel mods/owners to be able to dismiss the scraper, so might be a little more involved. I'll look into it as soon as I get a chance :)

BillieJackFu commented 8 years ago

Only broadcasters, mods, can ban someone. However you won't know you're banned unless you chat. So maybe add "Hello" to your scraper, then you will receive a message "You are permanently banned from talking in ".

georgevanburgh commented 8 years ago

@BillieJackFu Good point about only mods/owner having permission to ban. I believe twitch IRC sends a NOTICE message when you're banned from a channel - I should be able to simply intercept that

BillieJackFu commented 8 years ago

I'm a graphic designer, but my grad project for IT was programming; I understand it, but haven't touched any code since 2007. I leave that to the true geeks.

georgevanburgh commented 8 years ago

@BillieJackFu Well, I can't draw to save my life - so it goes both ways! Appreciate the contribution regardless 😃

theinfinitelurker commented 8 years ago

Probably not the appropriate place to ask this, but, what do you run something like this on? How many channels does it connect to at once?

georgevanburgh commented 8 years ago

@theinfinitelurker I'm guessing you mean with regards to hardware? I'll write up a post on this at some point, but at the moment (anacdotally) I'm running an instance of the scraper connected to some 20,000 Twitch channels on - dumping chat messages into a single ElasticSearch instance on the same machine (a HP Microserver G7), with the following specs:

This box processes around 15 million chat messages/day (which works out to ~2.5GB on disk) - and the scraper consumes between 5-10% of available CPU resources. The box is a little underpowered (especially in the RAM department) for some of the larger ElasticSearch queries - but seems to handle the ingesting of data without too much difficulty.

Hopefully that helps, if you have any further questions - feel free to reach out to me on Twitter :smile:

georgevanburgh commented 8 years ago

As per the Twitch IRC documentation, we only receive NOTICE messages, if we request the 'commands' capability as follows:

< CAP REQ :twitch.tv/commands
> :tmi.twitch.tv CAP * ACK :twitch.tv/commands

We then receive notification of a ban, through the 'CLEARCHAT' notice:

05-01-2016 23:15:07.764+0000 [DEBUG] Received: :tmi.twitch.tv CLEARCHAT #fire_eater64 :throwaway4132

In theory, we can check the target of the CLEARCHAT command, and remove ourselves from the chat if we've been banned. We'll never know if we're unbanned, however.

jaybyrrd commented 8 years ago

FireEater64, it is easy to verify who is a mod or not. Simply hit the endpoint: http://tmi.twitch.tv/group/user/CHANNELNAME/chatters

And listen for a specific command. It is NOT HARD TO DO and should take all of 5-10 minutes to figure out. I think it is a safe bet that if we send a command to blacklist our own channel that we don't want your bot back.

georgevanburgh commented 8 years ago

@jaybyrrd Thanks for the suggestions.

Simply hit the endpoint: http://tmi.twitch.tv/group/user/CHANNELNAME/chatters

This is certainly possible, however, the Twitch API only returns a list of currently online moderators - which prevents us from being able to cache them (or else a moderator who's just logged in would not be able to successfully request removal). If we hit the Twitch API every time we receive the '!removechatscraper' command - then we open ourselves up to abuse (chat spamming '!removechatscraper'). I'm not trying suggest that this isn't possible, simply more complicated than you make it out to be.

And listen for a specific command.

Again, entirely possible (with the caveat around identifying moderators given above). However, my problem with this (which is similar to the solution UniqBot uses) is that it requires channel operators to know a specific command (something like '!removechatscraper'). The nice part about listening for 'BAN' messages - is that everyone already knows how to ban, and with a few changes the scraper could be made to behave as expected when banned (leave the channel, and never come back). In my mind, introducing a custom keyword has no advantages over the current system (channel owners contacting me manually) - channel owners would still have to find my Twitter/Blog. I'm more than happy to consider pull requests that add that functionality - I just don't believe it's the right solution.