Need a way for SD increase the rate it reports posts in chat (e.g. multiple posts in one message?)

makyen commented 5 years ago

Earlier today, there was a spam wave on Ask Different that produced a large number of reports (266 spam posts). By the time it was over, the reports being posted by SD into Charcoal HQ were more than 160 reports behind and the reports being posted were more than an hour behind real-time, due to chat rate-limiting.

After it was over, SD rebooted and never reported into CHQ 160 posts for which there are records in MS.

For situations where the rate of spam posts significantly exceeds the normal chat rate limit, we need some way for SD to be able to catch-up chat to real-time, without just dropping a large number of reports. One possible way this might be done is to have SD report more than one post per message. This could be either a couple very abbreviated reports in one message, or a message containing multiple complete reports that's sent as a large chat message without Markdown formatting.

While multiple reports in a single message would be less convenient to deal with in chat, it would at least get the data there. If there are issues with the final formatting, we could use AIM, FIRE, or another userscript to better format the report once it's been posted (e.g. convert the bare Markdown in a long post to HTML).

makyen commented 5 years ago

Some data (from a conversation starting here):

On chat.SE, SD shows between 2.1k to 2.9k messages/day. On chat.MSE, SD shows 212 messages/day. On chat.SE, SD shows 79 messages/day.

The rate limiting guide indicates that a 20s cool-down is the maximum required in a worst case situation. That means that a max of 4,320 messages can be sent per day. That implies that SD spends a large portion of it's time on chat.SE rate-limited, at least a significant portion of the way up the rate-limit curve.

If we can move a good portion of the traffic from chat.SE to chat.SO or chat.MSE, that will be beneficial. However, it won't solve the problem. In that spam wave, the spam was coming in at a rate (6 or 7/minute) that was up to more than 2x the rate that SD can sustain posting messages (3/minute), even if it was only posting 1 message per report. Given that SD posts at least 2 messages per report (the report and the first feedback), even if we cut SD down to 1 room, that would not be sufficient.

makyen commented 5 years ago

For the messages which we have SD just relay into chat from MS, we could set up an RSS feed from MS and use one or more Feeds accounts which post into appropriate rooms. This would reduce the number of messages posted by at least one SD message per SD per report (i.e. it would be 1 report from SD and 1 message about having received feedback from a Feeds account, instead of 2 from SD). Doing this will make feedback a bit more visible (i.e. it will be in a separate monologue from SD's reports), which may be beneficial or detrimental.

ArtOfCode- commented 5 years ago

Fixing socketscience is probably a decent way to knock out a bunch of messages

j-f1 commented 5 years ago

Moving socketscience to Stack Overflow would remove that as an issue.

How about creating “SmokeDetector 2” and “SmokeDetector 3” accounts? This would allow for the running instance to automatically switch accounts when it gets close to being rate limited.

tripleee commented 5 years ago

A microarchitecture with multiple chat instances ("Smoke Detectors", the chatbot part) and perhaps multiple detection instances ("Smoke Detectors", the spamhandling.py part) would probably offer some other benefits too. I have been thinking about how to decouple SmokeDetector from chat so that I could run an instance and just ask it over a websocket "hey, is this spam?" and get back a verdict for Halflife.

Another option would be to pack multiple reports into a single chat message. You'd have to shorten the message significantly (drop all the reasons and probably the post title) which obviously has usabiliity drawbacks ... but not getting spam reported in a timely fashion is arguably also a significant usability problem even if we manage to eventually push through all pending messages. Since most of our regulars use FIRE and AIM anyway, perhaps we could rely on those for the presentation.

tripleee commented 5 years ago

Just to spell this out, you are talking about moving the Socket Science chat room to a different chat server chat.stackoverflow.com so as to reduce the load that we put on chat.stackexchange.com and free up bandwidth and quota for the main use case, the actual spam reports?

ArtOfCode- commented 5 years ago

moving the Socket Science chat room to a different chat server

that exactly

ArtOfCode- commented 5 years ago

and done with ae4454f90df4ad493b612bad47601a04d51f9b51

makyen commented 5 years ago

I think we should approach this from multiple directions. In broad strokes, there's at least one full solution, there are some possible workarounds, and there are steps we can take to mitigate the issue.

The following are just what I've seen mentioned here or in chat, or what I've come up with.

Full solution

Approach SE and ask for a much higher rate-limit, or no rate-limit, for messages posted by SD (i.e. get SE to greatly reduce the throttling applied to SD posting messages in chat). There's some precedent for SE permitting SD special access, in that they grant SD a quota of 20k requests/day on the SE API instead of the standard 10k/day.

Workarounds

Have multiple user accounts for SD which are used in a round-robin manner to avoid the rate limit.
I'd be much more comfortable with this if we have approval from SE to do it. Without that approval, it feels like sock-puppeting, in a bad way (i.e. specifically using multiple accounts to get around limits SE has placed on single accounts). But, it is a viable solution, if SE is OK with it (e.g. it might be easier for them to allow this than having a different chat rate-limit for SD).
Have SD put multiple reports in a single message. Use FIRE, AIM, or a new userscript to format these messages such that they are usable by people (preferably working with the current tools). Because there will only be one actual message, this will make it difficult to perform all actions which rely on there being separate messages for each report (e.g. feedback/commands as a reply, adding MS comments using replies, etc.).
Multiple posts in one message can be implemented in two different ways:
- Method A: Reduce the information content of each report so more than one fits in a single standard chat message. It's unclear how much the report data can be reduced and still be useful. How much each report can be reduced will determine how many reports will be able to fit in a single message. A drawback for this is that it requires the userscript to obtain the missing data in some other manner and that data may not be available from either MS or SE.
- Method B: Send multiple full reports, as they currently exist, in a long chat message. This would permit an unlimited number of reports to be sent in a single message. The main disadvantage for this method is that such messages cannot use Markdown formatting. However, if a userscript is processing them, then the userscript can convert the Markdown to HTML and provide a chat page displayed very close to what it would look like with one report per message. A minimal Markdown➞HTML converter already exists in the Chat page, so it's not much work for a userscript to do that conversion.
  Using this method, it would be desirable for the messages to not use protocol-relative URLs, as SE Chat doesn't linkify protocol-relative URLs. If fully qualified URLs are used in the longer message, then it will be possible for users to click on the links in the messages without the need for a userscript to do a Markdown➞HTML conversion. A userscript to do that conversion, and other formatting, would still be something we'd want to implement as a usability benefit, but it wouldn't be necessary for users to be able to use the reports in a manner similar to how the raw SD reports are currently used.
  This method could easily be extended to all messages that SD posts, or at least all SD reports and feedback notifications from MS. Consolidating all messages might get complex, as some are replies to commands, etc. Thus, it may be better to just consolidate reports and perhaps feedback reports from MS. Doing so would look like this.

Mitigation

Mitigation ideas have taken three major forms:

Load-balance SD messages between the three chat servers, as the rate-limit is per-server. This has considerable potential benefit, as the numbers indicate there were more than an order of magnitude more SD messages on chat.SE than on either of the other two servers.
Reduce the number of messages posted by SD by off-loading some messages to other accounts, where those accounts indicate the actual source of the messages (i.e. messages which are currently sent by MS for SD to post in chat).
Reduce the number of SD messages by eliminating some current activities.

Specific mitigation suggestions have been:

:heavy_check_mark: Move Socket Science to chat.SO or chat.MSE. (done: see Socket Science on chat.SO)
:heavy_check_mark: Move [Charcoal Test]() to chat.SO or chat.MSE. Charcoal Test is one of three rooms with experimental set, thus getting all reports. For this room, chat.SO is probably preferred, as chat.MSE already has The Fire Department, which is the other room with experimental set.
:heavy_check_mark: Not have SD post feedback received messages forwarded from MS (and/or other messages forwarded from MS)
- :heavy_check_mark: Use a different chat account for messages forwarded from MS (e.g. an MS account, rather than an SD account). This may require a separate bot. [This has been implemented in a "hack" form. It needs to be fully integrated.]
- Use an RSS feed(s) from MS and one or more "Feeds" accounts in appropriate chat rooms to handle messages that are currently relayed from MS through SD into chat (e.g. "tpu- feedback received on [] [MS]", etc.).
- If we don't use an RSS feed as a replacement for the MS➞SD➞Chat path, then during high use times we could consolidate those messages into a single message, either with or without reports, in a similar manner as is suggested in the workarounds for reports.
:heavy_check_mark: Improve ChatExchange: remove unnecessary additional delay between messages. Specifically, ChatExchange was adding a 5 second delay after any successful message and increasing by one second the required delay indicated by SE Chat. This should be a >20%+ improvement (e.g. with no prior messages, 3 messages will take 2 seconds instead of 11 seconds).

I haven't included any "eliminating some current activities" options in the above list, as I'd prefer not to have to reduce functionality that we actually use.

The workarounds and mitigation steps can degrade the user experience to varying degrees, but in contrast with not being able to keep up with the number of reports, enacting them is usually the better option.

We need either the full solution or one of the workarounds

We need to implement the full solution or at least one of the workarounds, because the mitigation options just aren't sufficient to match the report rate we saw in the most recent spam-wave.

The rate of reports we saw with the most recent spam wave was 287 reports in 1:13:21, which works out to 235 reports/hour (i.e. 287 reports in 1.225 hours). SD + MS currently require a minimum of 2 chat messages per report (the report and first feedback). Assuming SD was posting reports into only a single room per chat server, that's 470 messages from SD in an hour. Chat rate-limiting makes it possible to post at a maximum sustained rate of 180 messages per hour (1 every 20s).¹ So, by the end of an hour at the above report rate, SD would be 290 messages, or about 1.6 hours behind (assuming someone provided feedback on every post).

If we do offload the MS➞SD➞Chat messages (and again, assuming we cut back to only 1 room per chat server, which isn't a good assumption), then SD would only have needed to post 235 reports/hour in the spam-wave we've seen, but that's higher than the maximum sustained message rate of 180 messages/hour.

So, even if we implement every type of mitigation and reduce functionality to get to the absolute minimum, we won't be able to maintain real-time reporting when we see a similar spam-wave. Thus, we need either the full solution or one of the workarounds.

Starting from having posted no messages, it's a bit better than 180 messages in the first hour, because the required delay between messages increases with the number of messages posted to a maximum of 20s between messages. 180 messages/hour is the sustained rate.

tripleee commented 5 years ago

@Makyen Thanks for the thorough analysis (as always!). One thing I didn't see mentioned is that if we combine spam reports, replying to individual reports in chat would have to use a different and probably more cumbersome mechanism (like maybe @sd 123456 tpu- where 123456 is the Metasmoke id of the report).

_{Sent with GitHawk}

makyen commented 5 years ago

@tripleee Yes, if we combine messages there will be difficulties wrt. using actual replies for supplying feedback and adding comments to MS. It would be possible to have the userscript adjust what is sent, in addition to just the reply, to provide some additional semantics that SD could decode to get to the correct report within the chat message being replied to. However, this would would need to have additional support written both in the userscript and in SD. I haven't looked at what SD does currently, so I'm not sure how difficult this would be in SD.

Using the sd tp- shortcut commands could function, from the user's perspective, similarly to how they do now, but SD would need to track the separate reports within combined messages and anything else we're combining, rather than just complete chat messages.

tripleee commented 5 years ago

A couple of additional ideas ...

One way to split SmokeDetector would be to have two different roles -- say SmokeAlarm and SmokeChat. The Alarm role would use all of its quota for just the reports, and the other communications (replies to bot commands, feedback reports, etc) would be handled by SmokeChat.

(I don't particularly like this idea, but it could easily halve the chat usage of SmokeAlarm compared to the current SmokeDetector in terms of messages transmitted.)

Another mitigation might be an emergency account -- something listening to the RSS feed (or whatever we end up with) could process near-real-time reports in addition to the regular SmokeDetector reports, in which case users with the required tools could continue to work in real time but the old code would still be there as a slow but safe fallback. I guess the RSS reports would be useful in their own right regardless, and would not actually have to end up in chat at all -- a userscript could listen to them directly (though then maybe a websocket would be better than RSS).

angussidney commented 5 years ago

As a side note, it'd be interesting to benchmark how many messages we send to each chat server, and which rooms they're going to. That might make it easier to make decisions further down the line.

If we do consider going down the two-accounts path, I'd like to propose an alternate way that we could split up the messages. Currently, I could envisage us splitting all of our messages into three 'levels' of importance:

Highest priority: stuff that should be posted with no delay. E.g. CHQ/Fire Department reports, messages related to user interaction (e.g. command replies, manual reports etc - all chatrooms), error messages, some messages relayed from MS (incorrect autoflags)
Relatively important: should still be posted where possible (to keep our time-to-deletion down), but should be delayed at the expense of new reports to high-priority rooms. E.g. reports to the Tavern, SOCVR, site-sepcific chatrooms, socket science messages (?)
Not particularly important: stuff that could wait for hours, or never actually be posted, if everything is too busy. E.g. some debug messages (restart messages, API rollover etc), some messages relayed from MS (e.g. 'tpu- feedback recieved').

Level 1 messages could be submitted through one 'priority' bot, whilst all other messages can be submitted through a second bot. That way, in a worst-case scenario (like we had the other day), the 'priority' bot would have to post reports only to CHQ only - reducing message load by a significant amount, eliminating feedback messages (almost 1 for every report), reports to other rooms, debug messages etc. Using the stats that Makyen posted, it's still not going to be quite enough (235 messages posted vs 180/hour limit) - but that will significantly reduce lag in receiving the reports.

If the Level 2 bot gets ratelimited, then that's OK - because priority reports are still being delivered to CHQ, where the people are best equipped to handle them.

By the looks of it, most of the sorting of messages has already been done through our message_types system that we use to determine which reports go to which rooms. We may be able to utilise this existing functionality if we decide to go down this route.

ArtOfCode- commented 5 years ago

Assorted thoughts incoming.

Chief among them is this: the AD spam wave we saw a few days ago was highly anomalous. It's the only event of that magnitude we've ever seen. While we do get spam waves relatively frequently, they're almost always much less voluminous than the recent event. Basing numbers on this event is... perhaps useful to get lofty goals for optimization, but making them a target is going to make our lives harder.

On that note... having two Smokey accounts is one of those things that's technically possible, and I don't doubt SE would let us do it, but it's technically tricky to implement - all of Smokey's chat code is built around the assumption that we have one, and throwing another into the mix means a pretty significant refactor. One of those things that could be done if we have someone around with sufficient time and ability to do it, but if not... that's a bit of a dead end.

tripleee commented 5 years ago

I guess the ActionCable websocket where Metasmoke publishes detected spam is already good enough, and if we want an RSS feed of that, it could be easily provided by a separate tool, if we don't want it in Metasmoke itself?

My impression is that a websocket is actually more useful to presumptive userscripts than RSS anyway, is that incorrect?

_{Sent with GitHawk}

quartata commented 5 years ago

I actually don't think the trouble with multiple accounts lies in chatcommunicate -- the work there would not be too tricky. The main problem I see is that it involves managing multiple sets of account credentials, and I don't think environmental variables are a good way to do that. We'd need to bring in something like NG's credential store.

ArtOfCode- commented 5 years ago

@quartata chat creds are in config now

@tripleee Yep, WS is more useful in general to userscripts than RSS. Potentially also more useful to chat, if we want realtime-ish.

quartata commented 5 years ago

@ArtOfCode- Point still stands then, since the INI doesn't really have lists. You could have separate sections for each account I suppose.

quartata commented 5 years ago

(Sorry, misclick... doesn't ask for confirmation apparently)

ArtOfCode- commented 5 years ago

@quartata or you just convert config to yaml, which would be far easier and not require us to port in a whole new system

quartata commented 5 years ago

That's pretty much all I was getting at (the encryption part of NG isn't necessary if we don't want it), just that that's more effort than the actual chat code part

honnza commented 5 years ago

One easy change would be to drop report confirmations entirely if Smokey's backlogged in chat. And if folks really want those, then push them after any pending actual reports. For what it's worth I wouldn't mind dropping "[feedback] received on [message]" entirely.

makyen commented 5 years ago

If rate limiting gets bad, one thing that's currently available to do which will help significantly is to stop reporting into Charcoal Test, which gets a copy of every report and is on the chat.SE server. That can be done with a command like !!/block 3600 65945, which will block SD from reporting in that room for an hour. It can be reversed with a !!/unblock 65945 command. Doing this will remove all functionality which we use Charcoal Test for, but will cut the number of messages that SD reports to this server by about 1/3, or so. Long term, we could change to using a Charcoal Test room on either of the other chat servers (IIRC, this is mentioned above).

makyen commented 5 years ago

The reports going to Charcoal Test were removed from rooms.yml. Thus, using !!/block nnnn 65945 is no longer as beneficial, as that room is only getting the debug messages and responses to commands made in the Charcoal Test.

However, the debug messages are actually still substantial, representing about 19% of the messages posted to Charcoal Test in the first 18 hours of 2019-10-03, the day of the most recent spam wave (see next comment for numbers). OTOH, it appeared the number of debug messages wasn't that high during the actual spam wave. There were 15 debug messages during the spam wave. The debug messages are things like SD rebooting, or reloading the blacklists in response to a !!/watch or !!/blacklist-*.

makyen commented 5 years ago

Some information from the most recent spam wave, which was on 2019-10-03:

In about the first 15 hours of that day SD has posted 963 messages on chat.SE, 298 messages on chat.MSE, but only 62 messages on chat.SO.¹
In the first approximately 18 hours of that day, SD had posted 1.2k messages on chat.SE.¹ There were ~705 messages from SD in CHQ and 380 messages from SD in Charcoal Test. That leaves ~120, or so, that were posted to other rooms on chat.SE.
Of the 380 messages in Charcoal Test, about 71 were "debug" messages. However, only 15 of those debug messages were during the actual spam wave.

The numbers given above for how many message were posted by SD on the chat servers are what chat reported in the profile page for SD on each server at that time. However, these numbers are, at a minimum, cached by the server and are demonstrably imprecise (i.e. they didn't change over a short time based on SD posting more messages). The numbers for each room are based on counting the messages in their transcripts.

makyen commented 5 years ago

With SD reports disabled for Charcoal Test, that should reduce the message volume on chat.SE by about 26%.

Being not having SD post the feedback messages from MS should reduce SD's chat message volume on chat.SE by another 26% (closer to 45% of the now current volume). We could do that by:

Have SD stop sending feedback messages during high message count times.
- Could have a chat command to disable sending these messages
- Could have SD do this automatically
  For instance, stop sending them when the chat message queue is deeper than Q messages.
Have SD put the feedback messages in a lower priority queue. The lower priority queue then gets sent when no other messages are pending.
When there are >= X feedback messages pending, send them all as a single long message.
If X is 10, it would reduce the volume of these messages by an order of magnitude, which should get about a 30% to 40% drop in message volume during peak times (drop is from the now current volume; i.e. without Charcoal Test).
Use a Feeds account to send these messages
This would require that the messages are sent from MS as an RSS or ATOM feed, which we then have a chat room feed deliver into the room.

stale[bot] commented 4 years ago

This issue has been closed because it has had no recent activity. If this is still important, please add another comment and find someone with write permissions to reopen the issue. Thank you for your contributions.

makyen commented 1 month ago

The rate limiting for chat changed relatively recently, dramatically increasing the rate at which messages can be posted. Assuming that the change is permanent, I'm closing this issue. If the rate limit is returned to what it used to be, then this can be reopened.

Charcoal-SE / SmokeDetector