ctrlcctrlv / infinity

A vichan fork permitting users to create their own boards
Other
318 stars 149 forks source link

[discussion] ideas to stop spam #150

Open bui opened 10 years ago

bui commented 10 years ago

so with some people already going all autism and writing spambots for 8chan, I suggest we use this issue to discuss possible measures to prevent it

feel free to post all your ideas, I will try to respond to each and say how effective it could be, if at all others are free to do the same

for starters:

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/5104114-discussion-ideas-to-stop-spam?utm_campaign=plugin&utm_content=tracker%2F6417251&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F6417251&utm_medium=issues&utm_source=github).
bui commented 10 years ago

also I believe @ctrlcctrlv mentioned custom post/thread limits. this is good, and possibly add a "spam protection" option for mods that, if enabled, uses an algorithm to lower the amount of threads allowed to be created the more often they are posted. something like that.

we can record and use the 'average num/threads created per day' stat to use in said algorithm

ctrlcctrlv commented 10 years ago

ReCAPTCHA's TTL seems to be pretty high, I think the custom CAPTCHA should have a TTL of 2 minutes max and the countdown should start (and image should display) when the user clicks the input box.

I think a custom optional CAPTCHA along with custom spam options (how many threads to allow per hour) is how we're going to tackle this.

bui commented 10 years ago

from what I've tested, 8chan's reCAPTCHA TTL is 25-30 minutes. I think 4chan's is 5 minutes. still, paying for solving or having multiple workers could be used to deal with it

could you go on about how a custom CAPTCHA will help? if it's just something simple, text-based like reCAPTCHA, I don't see the difference. or are you thinking more of an interactive CAPTCHA, a la AYAH?

ctrlcctrlv commented 10 years ago

The custom CAPTCHA isn't really about helping, it's about not allowing Google to track users across pages. That's why I haven't made CAPTCHA a board option already - user privacy issues.

(Side note: I wonder how 4chan changed their recaptcha TTL, I don't see that option anywhere.)

Also, just by not using ReCAPTCHA you've already set back most flooders who are using custom scripts.

bui commented 10 years ago

ah, yeah, I understand the privacy concerns. However a custom CAPTCHA unfortunately won't stop dedicated spammers, even with a low TTL. It may have some effect but is not 100% foolproof

(Side note: I wonder how 4chan changed their recaptcha TTL, I don't see that option anywhere.)

I'm not sure either after looking through the docs. But their TTL is definitely shorter than normal.

AmberMutt commented 10 years ago

So /furry/ has a few ideas already that seem like they could even be implemented globally. Let's see what y'all think:

(1) Have mod-approved thread creation. Obviously this would require more than one mod per board, and mods watching pretty much 24/7. Might be good for really big boards with lots of mods.

(2) "Basically auto-prune any thread with no replies when it hits page 6. Threads with replies only autoprune when they reach page 10. The result is that the spam is limited to the first 5 pages, so at most it can only wipe half the board. To get around this limit they could add replies to the spam so that it goes all the way to page 10, but that doubles the amount of work they have to do. The limit could also be increased require more replies, so for example small threads with less than 5 replies get pruned instead of only ones with no replies."

"What if you made it so that it'll auto-prune threads with less than or equal to 5 replies. That way the attacker has to work 5 times as hard. Downside to this is that there might be legitimate threads getting pruned but I feel like threads below double digit replies aren't too big of a deal to get pruned."

"Ideally the number would be board-owner configurable, all the way from -1 (disabled) to 500. Anti-spam functions would only need <10, but extremely high post counts could effectively archive major threads."

I feel this is one of the best ideas I've seen so far.

(3) "Quick solution until we can get something better is just to make captchas expire after a minute."

Might be something that could be turned on/off as necessary, or if possible something that turns itself on automatically if suddenly multiple threads are being created in a short time span. Maybe even forced captcha expiring as soon as a flood of threads are created to try and prevent other spam threads from immediately getting through a cached captcha.

(4) "Another temp solution idea: would it be possible to have a one-time passcode sorta thing per IP? Like you have to do a small series of captchas or "what is 2 plus 3" tests just once before you can post, then once you do them the IP you're using is registered as good? Not sure how good of an idea it is, but it seems like something that would tremendously slow down someone spamming from multiple IPs."

To be honest, I don't know enough about IPs to know if this is even possible, but considering the threads were all created from unique IPs, I can only imagine that it would tremendously slow down spam threads.

(5) Thread creation time limits. Basically a timer resets every time a thread is created, and no other threads can be created until the time is up.

"Nobody wants to wait 5 minutes just because some faggot beat them to the punch with a piss thread. 5 per 10 minutes will give plenty of flexibility, and will probably give mods time to prevent disaster still."

Seems like something that could be permanently enabled for the slower boards, with the ability for mods to change the numbers/times around according to board growth.


I'll post more ideas if they come in, but for now I think this would be a good list of ideas to start with.

ctrlcctrlv commented 10 years ago

Some of these ideas are really clever, I especially like (2) which is something I had never even thought of before. Thanks for contributing, this is all I really wanted.

(3)...we're not sure how 4chan is doing that. We'll be making a custom CAPTCHA with a low TTL however, so that was planned.

(5) is in the works already.

(4) is already implemented on certain ranges. If you're in a range with a lot of attackers, you get shown this page: https://8chan.co/dnsbls_bypass.php (not in repo yet)

The only one I don't really like is (1), seems like it could be a way for unscrupulous volunteers to silence their users without anyone ever seeing it except the volunteers themselves.

AmberMutt commented 10 years ago

We have some new angles for idea (5) and a 6th idea that might could help with spam as well as in general.

(5) "Instead of limiting the amount of new topics that can be made in a specific period of time, we can lock new thread creation if a certain thread creation threshold speed is passed. This threshold can be adjusted based on the activity for a board. So for instance if an attacker starts creating threads at a fast pace, the board can lock new thread posting if the rate of new threads is say X new threads/min in the board. This would probably be really helpful for smaller boards since their thread creation rates are low and the threshold can be low as a result, halting the spam."

Perhaps the threshold could trigger the cooldown timer for thread creation, something the board mod would be able to override manually if it was accidentally set.

"Or, another idea going off that, would be to lock thread creation if the new threads/new posts ratio becomes high enough. So, it could look back at the previous 10 minutes of activity, and if it was 30 posts, 3 new threads that'd be fine, but 30 posts and 10 new threads could be indicative of spam, preventing thread creation."

Another thoughtful way (5) could be implemented. Perhaps this ratio could also work in favor of a specific thread being spammed, with the threshold triggering a thread specific cooldown.

(6) "What if we made it so threads bumped off the catalog weren't destroyed until 10 minutes had passed? This way, even if the entire catalog were displaced, a single post would bring them back into relevance, and spam would essentially be unable to destroy topics people were still looking at."

"What if threads bumped off the catalog have a lifespan proportional to the amount of replies it has, so spam threads have a very small lifespan compared to legit threads which will have very long lifespans."

Two different ideas built off this one in regards to mod controlled vs user controlled thread revival. I'll just post the convo in the order it happened so you can read both sides of the idea.

A) "I like this too, though I'd build off it with allowing the board mod(s) alone to be the one(s) who decides if the thread gets brought back, as I see some minor potential for someone to literally keep reviving dead threads just to be an ass." B) "I don't think it'd be that much of an issue. They can bump threads that should probably die while they're on the catalog as well, and when they're off, they only have a 10 minute window to do so. I mean, a thread that gets zero responses will probably last a few hours on the catalog still. So this change would only increase their ability to bump it from 180 minutes to 190 minutes. But at the same time, it limits the ability of spam to destroy threads." A) "Actually, an issue I see with users bumping a dead thread is if the entire catalog is filled with threads, then that would mean if a thread gets bumped off page 10 to die, there would literally be no more room for it to be revived to. You'd have to swap that dead thread with another thread somewhere on the catalog because there's no more space. I think it would be better used specifically as an anti-spam tool, for mods to use only when spam has destroyed the catalog, otherwise somebody has to decide which live thread to swap the dead one with." B) "I assumed it would just bump off the thread at the bottom of the catalog. Consider a catalog that holds 100 threads with this new system in place. Suppose a new thread is created, and the 100th thread is then removed from board. If, in the next ten minutes, that thread is bumped, it's moved back to the top of the catalog, and what was originally the 99th thread will be bumped off instead. While this may be said to "swap a live thread with a dead one," I'd say that the thread bumped at the end of its (arbitrarily determined by the size of the catalog) lifespan is more deserving of remaining in the catalog than one that didn't."

Lemme know what you think of these when you get around to them. Lemme know if you want me to keep posting our brainstorm ideas as well, in case we get too off topic.

bui commented 10 years ago

I also think (2) is smart, but it drastically cuts down the amount of threads allowed. I think make it 7-8 pages instead of 5 (or better, make it configurable).

AmberMutt commented 10 years ago

Well not necessarily limits the threads allowed, it just simply limits empty or near empty threads. If someone starts up a thread, and it only gets 3 replies before it reaches page 5, chances are it's not gonna end up seeing the first page again anyways. Even up to 10 posts in a thread still isn't much of a lose if it was bumped off early at page 6, as more popular threads are usually gonna have more than 10 replies. It also allows big threads to ride all the way to page 10 and the board to fill up like it normally would. The only thing that would be cut down are the number of garbage threads. And again, even if a full on spam attack like what was done before happened, we'd only end up losing half of the board (assuming the entire board was full), and at that, it would only be pages 6-10 that would be lost. Often the last pages are full of threads that are either no longer relevant, or have run out of steam to keep going. This keeps damage control to tolerable levels if the most relevant and updated 50 threads can't be lost from thread spam attacks.

My question for you Bui would be just how difficult it would be for someone to create spam threads in addition to replying to them as many as 5-10 times just to be able to start pushing -every- thread off the board. How badly would it have slowed you down if you were forced to reply to every spam thread up to just 5 times (even while assuming that Captcha never resets)? How much more difficult if Captchas reset every minute as well?

bui commented 10 years ago

currently: 150 captchas to fully wipe the board if this were to be implemented with a threshold of min. 5 posts past page 5, then (15 threads x 5 pages) x 6 posts: 450 captchas plus the other 75

of course, this is assuming the board has captcha enabled -- if it doesn't then only time is a factor, which much isn't needed of

so yes, this is a good idea

AmberMutt commented 10 years ago

Some more ideas from /furry/, since no one else is posting them:

(7) "Would it be possible to alter the way that new threads are created so that they enter a que to be made based on a cap on active threads. If the board isn't at the cap then new threads are instantly made. But if the board is capped then new thread requests go into a que to be created when an older thread falls off rather than bumping it off. It might be a little inconvenient until more complex code allows some nuance but it would seem to completely fix the problem we have with new thread floods. I mean if it's possible you may have code that makes unpopular threads die quickly while there's new threads in the que so new threads simply made while the board is abnormally full don't get held up to long. And of course limit the que and reject new requests after a certain limit. If you could put a notice to the board that it's at the cap and new topics may take a moment to show up it shouldn't inconvenience actual threads at all."

I think this one is along the same lines, but I don't really understand it admittedly. "Hold Y different copies of the board at once in your database system. To make the comparison to regular data structures easier, let's say these copies are arranged as a double-ended queue. Images are only ever stored once across all the double-ended queue's entries. They can be referenced from any entry, however. This makes it so you don't have to save the same image Y times, just a reference to the image. Designate the tail of the double-ended queue as the currently active segment. Every X minutes, if the dequeue has Y or more entries, pop it until it has Y-1 entries. Inject new entry into the dequeue, which has the same contents as the previous tail of the dequeue. This becomes the new active segment. If spam occurs, first wait until the spam ends, then eject the dequeue until you reach the first non-tainted entry, or until the queue is empty. Every time the queue is updated, search for images that are no longer referenced and remove them."

(8) "The second step is to implement a system which takes snapshots of the board at regular intervals (storing only the last two), so if the flooder does find a hole in the new captcha the mods can just roll the board back to yesterday. (If this takes up too much server space we can use something similar to Fuuka's image archival system, which seems to be pretty efficient.)"