Spam accounts on backdropcms.org

bugfolder commented 3 years ago

There's been a lot of recent discussion on Zulip about the thousands of spam accounts on backdropcms.org. Although we've made some changes to help mitigate the problem, it's looking like a multi-layered defense will be needed. Opening this issue to capture some of the discussion, things that have been done, and things that might still be done.

At present (and for quite some time in the past) we get 10–20 spam account submissions per day. The vast majority of all recent new user accounts have been spam accounts.

Currently, it looks like the approach that most spammers are using is to put their spam content into the Biography field of the user account (and sometimes in other fields, like the profile Website field) so that it shows up on the user profile page.

Bio field spam comes in a variety of languages: Vietnamese, Ukrainian, and Bengali are common, but there are quite a few English spammers and a smattering of other languages. In some cases, the bios are innocuous, but the website fields are links to online gambling sites, so still spammy.

We require email validation of new accounts, so this means a spammer needs to create the account, then respond to the confirmation email. Looking at creation and login dates, the confirmation happens anywhere from immediately to about an hour later.

Previously, putting spam into the Biography field meant that the person had to edit their profile after validating their new account. Now the Biography is on the user registration form, a change we made to try to make it easier to detect spammers at account-creation time.

We currently have Akismet on the site. Akismet (in principle) blocks form submissions based on spammy content. In order to let Akismet see the Bio field and block the user account creation, we moved the Bio field onto the user registration form. Although this lets Akismet have a crack at the Bio field now, it appears that it's passing most "bios". Many of them are not obviously spam (porn, gambling), but are merely advertisements for random businesses.

There are various tools to consider, some mentioned in the Zulip thread.

A nice preamble to all of this is this blog post by @BWPanda.

Other tools:

IP Address Manager — a Drupal (only) module that tracks IP addresses on a per-user basis and makes it easy to see where a user is from. I'm porting this to Backdrop.

GeoIP — H/T @indigoxela. On Backdrop. Detects country by IP and allows one to quarantine accounts from iffy countries. Needs Rules.

Spambot — A Drupal (only) module that checks IP addresses against the StopForumSpam.con database. A little spot-checking suggests this would block about 50% of the current spammers. I'm looking into porting this one to Backdrop.

There is also a View on backdropcms.org at Admin > User Accounts > Spam check, that shows recent user accounts and their fields, making it relatively easy to scan recent accounts for spam and delete them all via Views Bulk Operations. A useful tool, but requires daily attention to "mow the grass", so only a stop-gap measure.

docwilmot commented 3 years ago

Theres also the Register Country module, though its more into inclusion than exclusion. Could be modified to do the trick without further ports though.

docwilmot commented 3 years ago

https://github.com/backdrop-contrib/spambot

bugfolder commented 2 years ago

I'll note that we've now had Spambot and Akismet installed and active on the site, and while those both occasionally catch a spammer, we still get 10-15 new accounts per day (which I strive to delete each morning).

indigoxela commented 2 years ago

... we still get 10-15 new accounts per day (which I strive to delete each morning).

That's a lot. :disappointed: Any idea, what sort of spam accounts that is? Real humans searching for a "promo page"? Just trash from bots? From specific countries or from all over the globe? What are we dealing with?

indigoxela commented 2 years ago

Re Register Country - the same can be accomplished with GeoIP Tokens + Rules. Not sure if that's an option for us.

bugfolder commented 2 years ago

Any idea, what sort of spam accounts that is? ... What are we dealing with?

Here's a screen shot of a portion of the Spam check view for this morning. I deleted 32 accounts (everything you see here and more). Generally I use the biography as the main clue: if what they posted isn't a biography that plausibly has something to do with web development, it gets deleted.

I'd like to install IP Address Manager to see if the spammers are using the same IP for multiple account attempts, in which case we can start banning IPs. (I do that on a site I manage and it helps.)

spammers

Wylbur commented 2 years ago

Another option might be Cleantalk.org, a SAAS spam service.
https://cleantalk.org/

Currently there is no Backdrop module, but that could be easily resolved.
https://www.drupal.org/project/cleantalk

For a single site the price is $8/year. This solution was talked about a LOT when Mollom was finally discontinued.

indigoxela commented 2 years ago

I'd like to install IP Address Manager to see if the spammers are using the same IP for multiple account attempts...

That's worth a try.

Honestly, I don't see realistic chances to catch this spam by automated text pattern checks. They're just too different and no yelling obvious spam text, either.

Another assumption: these look like provided by click workers, which means we could reduce spam by adding GeoIP checks. Usually click workers are from poor countries. It might be enough to just set their accounts to "disabled" at first. They only get little money per item, so they won't waste time with reluctant sites.

indigoxela commented 2 years ago

The only other option I see is to disable automatic account activation. At least for some time (some weeks).

How many "real" accounts register per day/week? If we're talking about one serious account per week in contrast to 30+ spam accounts per day - guess what I'd choose. :stuck_out_tongue:

klonos commented 2 years ago

For a single site the price is $8/year.

Worth the try ^^

bugfolder commented 2 years ago

Update.

I've installed IP Address Manager to be able to see and record spammers' IP addresses;
I've installed Ban IP to be able to manually ban offenders.

There was a suggestion in the Zulip chat that since the only current reason people might need a b.org account is to list their services for hire, they (theoretically) already have some involvement in the Backdrop community, so we could ask them to describe their community involvement. (We already have checkboxes to that effect, but those are (and have been) easily gamed.) So I'm going to add a new field to the registration form asking for a human-created narrative of current involvement in Backdrop.

In the future, the assumption that user accounts must already have some involvement may change; once we get Civi up and running and have mailing lists, a b.org user might simply want to receive mailings (that Civi is providing). But we can deal with that change when the time comes.

@indigoxela said:

The only other option I see is to disable automatic account activation. At least for some time (some weeks).

Agreed. I think that's a next step, but let's wait a little bit to collect some further data on IPs and the "reason why I need an account" narrative.

findlabnet commented 2 years ago

Just out of curiosity, why would you choose "Ban IP" to be able to manually ban offenders? As far as I remember, do you have tested another module?

bugfolder commented 2 years ago

Just out of curiosity, why would you choose "Ban IP" to be able to manually ban offenders? As far as I remember, do you have tested another module?

At the moment, I'm using it because it's simple, familiar, and does the minimum needed. I'm also looking at IP Address Blocking, which has some additional functionality, and might replace it.

yorkshire-pudding commented 2 years ago

Honestly, I don't see realistic chances to catch this spam by automated text pattern checks. They're just too different and no yelling obvious spam text, either.

I've added in Zulip but for the record, here is my suggestion:

Looking at it from the other angle, is there anything that genuine accounts have in common? Do they all mention website development and/or Backdrop CMS and/or Drupal? Could an allow list be built up of terms that are uncommon in spam, but are common in genuine entries? Could a service then check to see if any of the allow listed terms are present?

Here is a basic proof of concept using Rules to allow a different process if someone doesn't mention Backdrop in their bio: Following on from my suggestion above, I created a simple rule that looks for the phrase "Backdrop CMS" (this could be extended with other key words as OR conditions) and if it doesn't find it then it blocks the user and marks as pending approval:

{ "rules_user_register" : {
    "LABEL" : "User register",
    "PLUGIN" : "reaction rule",
    "OWNER" : "rules",
    "TAGS" : [ "user" ],
    "REQUIRES" : [ "rules" ],
    "ON" : { "user_insert" : [] },
    "IF" : [
      { "NOT text_matches" : { "text" : [ "account:field-bio" ], "match" : "Backdrop CMS" } }
    ],
    "DO" : [
      { "user_send_account_email" : { "account" : [ "account" ], "email_type" : "register_pending_approval" } },
      { "user_block" : { "account" : [ "account" ] } }
    ]
  }
}

The allow list might need to be wider, but this takes into account generic service providers that just want to be listed rather than ensuring their service is good for Backdrop (as @BWPanda has flagged)

oadaeh commented 2 years ago

I'm coming a little late to this, but I wanted to say (and I won't bring it up again) that I added the Antibot module to a few sites that were receiving a few contact and user form spam submissions per day (not as many as here, but still quite a bit), and the number dropped to just a couple over the last two years. I don't know if it would help in this situation, but it seems worth a shot.

bugfolder commented 2 years ago

I added the Antibot module to a few sites...

It seems worth trying out, if only to determine if our infestation is coming more from bots or click workers.

bugfolder commented 2 years ago

So, I just noticed that Antibot is already on the site (and has been there since 2018). And the user register form is listed in its settings. So probably most or all of the accounts we're seeing now are click workers.

bugfolder commented 2 years ago

Well, the new honeypot field ("What's your interest in Backdrop...") isn't adding a whole lot. Both real and spam account creators tend toward one-word answers that (for the spammers) is still plausible. The best detection still comes from the biography field, which is where the spammers put their spam.

Interestingly, since the bio field is now no longer on the user reg form, it means the spammers need to both create an account, then go in and edit the bio to inject their dross. Which they are still doing, at a rate of 5–20 per day.

docwilmot commented 2 years ago

To make cancelling and banning easier, could we get a View of new users in the last X weeks, and create a VBO action that includes cancelling and blocking IP and reporting to forum spam all at once?

bugfolder commented 2 years ago

There's a View that presents the latest accounts and their fields that are most likely to display whether theyr'e spamming:

https://backdropcms.org/admin/people/spam

The VBO that does bulk deletion also reports to Akismet. We can check whether Stop Forum Spam has the appropriate deletion hook for reporting.

bugfolder commented 2 years ago

Gonna check whether IP Blocking or Ban IP has that hook as well.

Checking IP addresses over the last two weeks, there's a decent amount of repetition of IP addresses among known spammers, so blocking should have a discernible effect.

docwilmot commented 2 years ago

If Ban IP doesnt have a hook, I'll add one.

bugfolder commented 2 years ago

The two approaches raised at the 2022-03-31 outreach meeting:

Implement delayed publication of spammy fields in the user profile, requiring explicit approval before they are made visible.
Block known spammer's IP addresses.

The first will require some significant development, I think (though would be happy to be wrong about that). The second requires less development, so that's what I'll pursue next.

Neither Ban IP nor IP Blocking modules currently implement the integration with IP Address Manager module needed to add IP-banning to the user cancellation forms, but I've just submitted a PR to the latter that adds this. (Happy to do the same for Ban IP—the code would be nearly the same—but I thought we'd try out IP Blocking module for this, because it stores a reason for each blockage, which is a nice feature.)

For those that are interested, this morning's tally on b.org was 11 definite spammers, 2 plausible new accounts. Here's a screen shot of part of the Spam Check page prior to action (the blacked-out information is one of the plausible accounts).

yorkshire-pudding commented 2 years ago

@bugfolder - from what I can tell without seeing the view definition, the service provider list is a view of user profiles where they have ticked "Available for hire".

On a view of user accounts it is possible to add a filter by "User account: Roles". By adding a role that doesn't need to have any permissions but is simply added by moderator approval, you could filter this view.

By removing these from the view they wouldn't be findable, would they?

Perhaps I'm wrong in my assumptions. If so, is there any chance you can share the view config file so I can experiment further?

bugfolder commented 2 years ago

is there any chance you can share the view config file

Sure.

views.view.providers.json.txt

I will say that I'm not crazy about keeping around spammers' data long-term even if we're not letting it be displayed. It doesn't just clutter up the db (dbs, once CiviCRM is installed), but it also tends to add a layer of inconvenience to administration.

Incidentally, although we've had problems in the past with spammers in the Service Provider listing, it looks like most spammers don't get as far as adding an "Organization" to this listing; they content themselves with spamming the "Biography" field.

yorkshire-pudding commented 2 years ago

I think I was thinking about the Contractor for Hire block - that's where the bio is shown isn't it? Must have been a PEBCAM on my end.

I agree we still need to get rid of the content regularly but if we can prevent it from ever displaying, then:

it might in the long term reduce the incentive
it won't show up in search results
it won't annoy genuine visitors to the site

So yes, they would need deleting, but there would be less urgency, although of course, for genuine registrants, you want to approve reasonably promptly.

bugfolder commented 2 years ago

Here's the Contractors for Hire page. It doesn't display bios.

https://backdropcms.org/support/services/contractors

yorkshire-pudding commented 2 years ago

I know that page doesn't and the block on the side of service providers doesn't, but they both link to the user profile where the bio is displayed. If we can prevent them showing in the view then there won't be a link to their account anywhere, so won't appear in search results or be visible to anyone looking for contractors.

backdrop-ops / backdropcms.org

Spam accounts on backdropcms.org #815