Closed makyen closed 1 month ago
Some data (from a conversation starting here):
On chat.SE, SD shows between 2.1k to 2.9k messages/day. On chat.MSE, SD shows 212 messages/day. On chat.SE, SD shows 79 messages/day.
The rate limiting guide indicates that a 20s cool-down is the maximum required in a worst case situation. That means that a max of 4,320 messages can be sent per day. That implies that SD spends a large portion of it's time on chat.SE rate-limited, at least a significant portion of the way up the rate-limit curve.
If we can move a good portion of the traffic from chat.SE to chat.SO or chat.MSE, that will be beneficial. However, it won't solve the problem. In that spam wave, the spam was coming in at a rate (6 or 7/minute) that was up to more than 2x the rate that SD can sustain posting messages (3/minute), even if it was only posting 1 message per report. Given that SD posts at least 2 messages per report (the report and the first feedback), even if we cut SD down to 1 room, that would not be sufficient.
For the messages which we have SD just relay into chat from MS, we could set up an RSS feed from MS and use one or more Feeds accounts which post into appropriate rooms. This would reduce the number of messages posted by at least one SD message per SD per report (i.e. it would be 1 report from SD and 1 message about having received feedback from a Feeds account, instead of 2 from SD). Doing this will make feedback a bit more visible (i.e. it will be in a separate monologue from SD's reports), which may be beneficial or detrimental.
Fixing socketscience is probably a decent way to knock out a bunch of messages
Moving socketscience to Stack Overflow would remove that as an issue.
How about creating “SmokeDetector 2” and “SmokeDetector 3” accounts? This would allow for the running instance to automatically switch accounts when it gets close to being rate limited.
A microarchitecture with multiple chat instances ("Smoke Detectors", the chatbot part) and perhaps multiple detection instances ("Smoke Detectors", the spamhandling.py
part) would probably offer some other benefits too. I have been thinking about how to decouple SmokeDetector from chat so that I could run an instance and just ask it over a websocket "hey, is this spam?" and get back a verdict for Halflife.
Another option would be to pack multiple reports into a single chat message. You'd have to shorten the message significantly (drop all the reasons and probably the post title) which obviously has usabiliity drawbacks ... but not getting spam reported in a timely fashion is arguably also a significant usability problem even if we manage to eventually push through all pending messages. Since most of our regulars use FIRE and AIM anyway, perhaps we could rely on those for the presentation.
Just to spell this out, you are talking about moving the Socket Science chat room to a different chat server chat.stackoverflow.com so as to reduce the load that we put on chat.stackexchange.com and free up bandwidth and quota for the main use case, the actual spam reports?
moving the Socket Science chat room to a different chat server
that exactly
and done with ae4454f90df4ad493b612bad47601a04d51f9b51
I think we should approach this from multiple directions. In broad strokes, there's at least one full solution, there are some possible workarounds, and there are steps we can take to mitigate the issue.
The following are just what I've seen mentioned here or in chat, or what I've come up with.
experimental
set, thus getting all reports. For this room, chat.SO is probably preferred, as chat.MSE already has The Fire Department, which is the other room with experimental
set.I haven't included any "eliminating some current activities" options in the above list, as I'd prefer not to have to reduce functionality that we actually use.
The workarounds and mitigation steps can degrade the user experience to varying degrees, but in contrast with not being able to keep up with the number of reports, enacting them is usually the better option.
We need to implement the full solution or at least one of the workarounds, because the mitigation options just aren't sufficient to match the report rate we saw in the most recent spam-wave.
The rate of reports we saw with the most recent spam wave was 287 reports in 1:13:21, which works out to 235 reports/hour (i.e. 287 reports in 1.225 hours). SD + MS currently require a minimum of 2 chat messages per report (the report and first feedback). Assuming SD was posting reports into only a single room per chat server, that's 470 messages from SD in an hour. Chat rate-limiting makes it possible to post at a maximum sustained rate of 180 messages per hour (1 every 20s).1 So, by the end of an hour at the above report rate, SD would be 290 messages, or about 1.6 hours behind (assuming someone provided feedback on every post).
If we do offload the MS➞SD➞Chat messages (and again, assuming we cut back to only 1 room per chat server, which isn't a good assumption), then SD would only have needed to post 235 reports/hour in the spam-wave we've seen, but that's higher than the maximum sustained message rate of 180 messages/hour.
So, even if we implement every type of mitigation and reduce functionality to get to the absolute minimum, we won't be able to maintain real-time reporting when we see a similar spam-wave. Thus, we need either the full solution or one of the workarounds.
@Makyen Thanks for the thorough analysis (as always!). One thing I didn't see mentioned is that if we combine spam reports, replying to individual reports in chat would have to use a different and probably more cumbersome mechanism (like maybe @sd 123456 tpu-
where 123456 is the Metasmoke id of the report).
Sent with GitHawk
@tripleee Yes, if we combine messages there will be difficulties wrt. using actual replies for supplying feedback and adding comments to MS. It would be possible to have the userscript adjust what is sent, in addition to just the reply, to provide some additional semantics that SD could decode to get to the correct report within the chat message being replied to. However, this would would need to have additional support written both in the userscript and in SD. I haven't looked at what SD does currently, so I'm not sure how difficult this would be in SD.
Using the sd tp-
shortcut commands could function, from the user's perspective, similarly to how they do now, but SD would need to track the separate reports within combined messages and anything else we're combining, rather than just complete chat messages.
A couple of additional ideas ...
One way to split SmokeDetector would be to have two different roles -- say SmokeAlarm and SmokeChat. The Alarm role would use all of its quota for just the reports, and the other communications (replies to bot commands, feedback reports, etc) would be handled by SmokeChat.
(I don't particularly like this idea, but it could easily halve the chat usage of SmokeAlarm compared to the current SmokeDetector in terms of messages transmitted.)
Another mitigation might be an emergency account -- something listening to the RSS feed (or whatever we end up with) could process near-real-time reports in addition to the regular SmokeDetector reports, in which case users with the required tools could continue to work in real time but the old code would still be there as a slow but safe fallback. I guess the RSS reports would be useful in their own right regardless, and would not actually have to end up in chat at all -- a userscript could listen to them directly (though then maybe a websocket would be better than RSS).
As a side note, it'd be interesting to benchmark how many messages we send to each chat server, and which rooms they're going to. That might make it easier to make decisions further down the line.
If we do consider going down the two-accounts path, I'd like to propose an alternate way that we could split up the messages. Currently, I could envisage us splitting all of our messages into three 'levels' of importance:
Level 1 messages could be submitted through one 'priority' bot, whilst all other messages can be submitted through a second bot. That way, in a worst-case scenario (like we had the other day), the 'priority' bot would have to post reports only to CHQ only - reducing message load by a significant amount, eliminating feedback messages (almost 1 for every report), reports to other rooms, debug messages etc. Using the stats that Makyen posted, it's still not going to be quite enough (235 messages posted vs 180/hour limit) - but that will significantly reduce lag in receiving the reports.
If the Level 2 bot gets ratelimited, then that's OK - because priority reports are still being delivered to CHQ, where the people are best equipped to handle them.
By the looks of it, most of the sorting of messages has already been done through our message_types
system that we use to determine which reports go to which rooms. We may be able to utilise this existing functionality if we decide to go down this route.
Assorted thoughts incoming.
Chief among them is this: the AD spam wave we saw a few days ago was highly anomalous. It's the only event of that magnitude we've ever seen. While we do get spam waves relatively frequently, they're almost always much less voluminous than the recent event. Basing numbers on this event is... perhaps useful to get lofty goals for optimization, but making them a target is going to make our lives harder.
On that note... having two Smokey accounts is one of those things that's technically possible, and I don't doubt SE would let us do it, but it's technically tricky to implement - all of Smokey's chat code is built around the assumption that we have one, and throwing another into the mix means a pretty significant refactor. One of those things that could be done if we have someone around with sufficient time and ability to do it, but if not... that's a bit of a dead end.
I guess the ActionCable websocket where Metasmoke publishes detected spam is already good enough, and if we want an RSS feed of that, it could be easily provided by a separate tool, if we don't want it in Metasmoke itself?
My impression is that a websocket is actually more useful to presumptive userscripts than RSS anyway, is that incorrect?
Sent with GitHawk
I actually don't think the trouble with multiple accounts lies in chatcommunicate
-- the work there would not be too tricky. The main problem I see is that it involves managing multiple sets of account credentials, and I don't think environmental variables are a good way to do that. We'd need to bring in something like NG's credential store.
@quartata chat creds are in config now
@tripleee Yep, WS is more useful in general to userscripts than RSS. Potentially also more useful to chat, if we want realtime-ish.
@ArtOfCode- Point still stands then, since the INI doesn't really have lists. You could have separate sections for each account I suppose.
(Sorry, misclick... doesn't ask for confirmation apparently)
@quartata or you just convert config to yaml, which would be far easier and not require us to port in a whole new system
That's pretty much all I was getting at (the encryption part of NG isn't necessary if we don't want it), just that that's more effort than the actual chat code part
One easy change would be to drop report confirmations entirely if Smokey's backlogged in chat. And if folks really want those, then push them after any pending actual reports. For what it's worth I wouldn't mind dropping "[feedback] received on [message]" entirely.
If rate limiting gets bad, one thing that's currently available to do which will help significantly is to stop reporting into Charcoal Test, which gets a copy of every report and is on the chat.SE server. That can be done with a command like !!/block 3600 65945
, which will block SD from reporting in that room for an hour. It can be reversed with a !!/unblock 65945
command.
Doing this will remove all functionality which we use Charcoal Test for, but will cut the number of messages that SD reports to this server by about 1/3, or so.
Long term, we could change to using a Charcoal Test room on either of the other chat servers (IIRC, this is mentioned above).
The reports going to Charcoal Test were removed from rooms.yml. Thus, using !!/block nnnn 65945
is no longer as beneficial, as that room is only getting the debug messages and responses to commands made in the Charcoal Test.
However, the debug messages are actually still substantial, representing about 19% of the messages posted to Charcoal Test in the first 18 hours of 2019-10-03, the day of the most recent spam wave (see next comment for numbers). OTOH, it appeared the number of debug messages wasn't that high during the actual spam wave. There were 15 debug messages during the spam wave. The debug messages are things like SD rebooting, or reloading the blacklists in response to a !!/watch
or !!/blacklist-*
.
Some information from the most recent spam wave, which was on 2019-10-03:
With SD reports disabled for Charcoal Test, that should reduce the message volume on chat.SE by about 26%.
Being not having SD post the feedback messages from MS should reduce SD's chat message volume on chat.SE by another 26% (closer to 45% of the now current volume). We could do that by:
This issue has been closed because it has had no recent activity. If this is still important, please add another comment and find someone with write permissions to reopen the issue. Thank you for your contributions.
The rate limiting for chat changed relatively recently, dramatically increasing the rate at which messages can be posted. Assuming that the change is permanent, I'm closing this issue. If the rate limit is returned to what it used to be, then this can be reopened.
Earlier today, there was a spam wave on Ask Different that produced a large number of reports (266 spam posts). By the time it was over, the reports being posted by SD into Charcoal HQ were more than 160 reports behind and the reports being posted were more than an hour behind real-time, due to chat rate-limiting.
After it was over, SD rebooted and never reported into CHQ 160 posts for which there are records in MS.
For situations where the rate of spam posts significantly exceeds the normal chat rate limit, we need some way for SD to be able to catch-up chat to real-time, without just dropping a large number of reports. One possible way this might be done is to have SD report more than one post per message. This could be either a couple very abbreviated reports in one message, or a message containing multiple complete reports that's sent as a large chat message without Markdown formatting.
While multiple reports in a single message would be less convenient to deal with in chat, it would at least get the data there. If there are issues with the final formatting, we could use AIM, FIRE, or another userscript to better format the report once it's been posted (e.g. convert the bare Markdown in a long post to HTML).