flarum / framework

Simple forum software for building great communities.
http://flarum.org/
6.37k stars 834 forks source link

Anti-spam #271

Closed tobyzerner closed 9 years ago

tobyzerner commented 9 years ago

Need to work out what anti-spam strategies to implement.

Felli commented 9 years ago

As of right now (on esoTalk) a plugin called Akismet works rather well, so maybe an extension of that?

pcrumm commented 9 years ago

@tobscure is Akismet integration (or something similar) you'd like to see in core, or better off as an extension?

phpBB also supports a pluggable CAPTCHA API. Maybe a middle ground is shipping an anti-spam oriented add-on to the extensions system and one or two stock anti-spam plugins.

tobyzerner commented 9 years ago

@pcrumm Extension, but one we would definitely consider bundling with the download.

I think there are a number of other strategies worth considering (don't allow new users to post links, etc.) I'll sit down at some point this coming week and nut out some ideas.

franzliedke commented 9 years ago

Antispam features should all be extensions, though certainly some can be bundled.

pcrumm commented 9 years ago

@tobscure eh, I've always favored doing as much heuristically as possible. I spent a lot of time on this once upon a time, and most other strategies just end up being continually cat-and-moused.

franzliedke commented 9 years ago

Since we're so JavaScript-based, I think we can employ some tricks like checking for certain browser features to make sure a human is browsing our site - there's not much we can do against those anyway...

tobyzerner commented 9 years ago

But spam bots will just interact with the API directly, presumably?

pcrumm commented 9 years ago

XRumer (super common spambot software) tries to look like a normal user, so it'd use the frontend. If flarum takes off, I wouldn't be surprised if someone gets lazy and makes a tool that targets the API, though.

IMHO, that suggests that the most effective solution will be pretty client-agnostic, though that doesn't mean we shouldn't take advantage of additional layers in places where we can (e.g. the web client).

geoclicks commented 9 years ago

Here's what we do on our forum which has close to 200K users and 1M posts.

Depending on how the API is architected (i.e. plugins get executed on API calls) this is will take care of PhantomJS type bots and API spammers.

1) Block At Network: Should support Country, CIDR or Individual IP address. Would be nice if it took threat lists from places like StopForumSpam (but unless you run a super huge forum, I doubt you need this.

We do this at the firewall level, so that PHP process time is not consumed. We also do this at the application level in PHP as it is sometimes just easier to add the ban to just the forum and not globally.

Can also block with user agent strings. We know which ones are badly configured bots!

2) Block on Registration: Several things are done here. Identify time on page if it is below a certain threshold, we're looking at a bot. Ask a very very simple human question (what color is next red, white...). On submission, check for banned emails (wild-carded), send the registration IP and email to stopforumspam and accept registration only if everything passes. On rejection, give the user a contact which they can email if they're legit. Hardly anyone ever does.

3) Block / Sandbox after registration. Check for registrations from a given IP address and old user cookies (stacked away in a cookie that does not get deleted, or in localstorage). If the user's old logins have been banned, send the new user to a silent / moderated group.

4) New users have a configurable threshold of what they can post ( 0 links for example ). All links are generally nofollowed.

Other than that .. we do get human crafted spam which is hard to detect and very "bespoke". The users / community generally reports this.

We also use cloudflare to throw up human challenges to dodgy browsers etc.

ghost commented 9 years ago

"2) Block on Registration: Several things are done here. Identify time on page if it is below a certain threshold, we're looking at a bot. "

If it's something that's going to be implemented please consider making a sort of recovery mechanism, like giving them another chance to use a different CAPTCHA or something if they fill out the signup form too fast. A few weeks back I tried to join a vbulletin forum that had this in place. I've joined so many forums in my life and used autocomplete, and so it permanently banned me with no option to go "no, I'm not a spambot"...

The easiest thing would probably be to have individual plugins for 3rd-party CAPTCHAs like those from Google, SolveMedia, KeyCaptcha, etc. Those tend to be quite effective.

Traditional helper methods like asking a special question unique to the forum is another method that seems to be relatively useful. Perhaps on a Chevy/GMC truck forum one would ask "how many letters in GMC" to which the obvious answer would be "3" or "three".

The hard part is to make this all as user friendly as possible for both the member signing up and the forum admin.

As the person above mentioned, being able to report posts to forum staff is critical, not just for anti-spam but for general moderation as well.

Relying on something like CloudFlare is probably not a good idea.

geoclicks commented 9 years ago

I agree with the fall back. The time to registration is just one thing that adds to the list of measures.

Cloudflare is very good for very obvious detection. Mucks up api requests though. So that make be an issue.

Again, we need two or three measures to stop spammers. Before registration, on submission of any content (including profiles, private messages etc) and a good reporting mechanism for users.

tobyzerner commented 9 years ago

Gonna start with the low-hanging fruit:

mtotheikle commented 9 years ago

As a way to get into Flarum and see how extensions work, I've taken an initial stab at the reCAPTCHA implementation. You can see it at https://github.com/mtotheikle/flarum-recaptcha and I'll have some more updates tomorrow / this week such as support for setting the keys in admin panel.

tobyzerner commented 9 years ago

@mtotheikle This is fantastic! Would you be interested in working on this extension as part of the github.com/flarum organisation? reCAPTCHA is an extension we'd like to officially support and include by default.

mtotheikle commented 9 years ago

@tobscure That sounds great! I should be able to get the admin stuff done tonight which I think is the only main piece really missing. If I complete that and don't find any issues, then it'd be great to have this packaged with next beta.

ghost commented 9 years ago

I noticed this was closed. Have you added the reCAPTCHA? I ask because the 3rd party extension is not functional and the developer is MIA (missing in action) since September.

tobyzerner commented 9 years ago

Oh sorry, I forgot to update this. Decided against first-party reCAPTCHA extension for now because:

Sorry for the confusion. Hopefully @mtotheikle will update for beta 3 when it's out (there will be updated docs!) :)

ghost commented 9 years ago

I'm disappointed to hear that. At the moment none of the features would be suitable for a large community.

Akismet Enterprise plan only allows a maximum 100,000 monthly checks. We have in the past used that in the first few days.

Flood control only goes so far.

The general idea would be to add a layer to keep nastiest from joining.

johnhearfield commented 8 years ago

@Code-Name-Debian Good point. One idea is to hook into the User's Ip and check it again Stop Forum Spam and Project Honeypot, but this can cause false positives.

lZzozZl commented 7 years ago

[Suggestion] patternLock.js

It has a CAPTCHA option which is fun. It's like android phone unlock. :+1:

luceos commented 7 years ago

@lZzozZl please report feature requests on the forum, the github bug tracker has been given the sole responsibility of bug reporting and approved feature conceptualisation a while ago. Thanks for sharing your thoughts and I do hope you share them through the correct channel.