How to defeat account creation spambots.

LemmyNet / lemmy

🐀 A link aggregator and forum for the fediverse

https://join-lemmy.org

GNU Affero General Public License v3.0

13.31k stars 883 forks source link

How to defeat account creation spambots. #2355

Closed dessalines closed 1 year ago

dessalines commented 2 years ago

A few trolls have created lemmy account creation spambots, that take advantage of the fact that:

You can disable captcha, register rate limits, email verification, and account registration, to create thousands of accounts on a new server, then spam federated servers, (that are either open or blocklist federation) with troll content.
While its easy to ban and delete content for a single user, its not easy to do so for thousands of generated ones, and a manual DB command must be used to clear them out. (IE delete from person where actor_id like 'https://DOMAIN%'; )

As a temp fix until we fix this, you can either disable federation, or switch to allowlist federation, and make sure no server in your allowlist has open signups.

This is the only solution I can think of, and probably both of these should be used.

Make at least one of the 4 account creation protection methods mandatory. Or maybe a subset: like either captcha or account registration must be enabled. ( I like that the best ).
When restarting a server, check the blocklist, and DB delete content of any person or community with actor_ids from those servers. This would do a good job of "cleaning up" the troll accounts, but has the negative of deleting lots of content. It also would protect against potentially other malicious activitypub services that link to lemmy.

cc @Nutomic

LunaticHacker commented 2 years ago

Make at least one of the 4 account creation protection methods mandatory. Or maybe a subset: like either captcha or account registration must be enabled. ( I like that the best ).

We can't enforce this on malicious actors

Nutomic commented 2 years ago

I think what we need is essentially a system which limits posting from new, untrusted users. One example of such a system are trust levels in discourse. We can design something similar for Lemmy, with the difference that it cant be based on the amount of reading, because that isnt federated. So im thinking something like the following:

new user needs to write at least 5 comments before being able to create posts
user needs to have made at least 10 comments and 2 posts before being allowed to upload any images
similar limits for other actions, like sending private messages or posting external links
account age should also be taken into account, so a new account cant simply spam comments for an hour and be done

This type of system has the advantage of being almost entirely automated. However, it wont work perfectly with federation, because for remote users we rarely know about all of their posts/comments. So its possible that a user makes a popular post on another instance, but our instance doesnt know about the user yet. Based on the above criteria, the post wouldnt be allowed on our local instance, even if the author was already very active on his own instance for a long time.

To solve this dilemma, its probably necessary to have some amount of manual reviews for new accounts. This could be implemented with an admin setting to specify which posts/comments need manual admin review before being published, for example:

user has less than x total posts/comments
account is less than x days old
posted an image
account is from a low-activity remote instance

The ideal solution would probably be some kind of combination of the above two. But the topic is quite complex, so i wouldnt want to rush and spend a lot of work to implement something which later turns out to not be optimal. For this reason i think it would make sense to first spend some time to investigate in depth how other platforms solve this problem, maybe even read some scientific papers. This research could be a separate NLnet milestone.

Of course all of this will take time. For a short term solution, i think it could help a lot to limit image posting, especially in comments. Because the troll is mostly busy posting shock images in comments, and would have to find something else to post then. In any case, images in comments dont generally add much value in my opinion.

dessalines commented 2 years ago

We can't enforce this on malicious actors

That's true, which is why the 2nd solution I proposed could also be used: on server restarts, DB delete all content from the malicious servers. And I also do think that requiring either captcha or account registrations would still be a good thing to do, because then it'd take a manual fork, compile, and complete setup of lemmy to do. With all the promises from redditors and hackernews users that these forks would be "easy to do", there's been like 3 of them that have tried to do this over the course of 2+ years.

trust levels in discourse

I really dislike complicated systems of limiting interaction, or limited abilities, rather than preventing spam at the door, because you still are allowing some interaction to get through ( if allowed to comment, the new malicious accounts will just spam comments then ). Most importantly, they don't prevent the issue in this post, which is mass spam account creation. Lemmy's current tools for removing and deleting data for small numbers of trolls are working fine... mass spam account creation is the only new thing that's appearing to be a problem.

We already have manual reviews in the form of registration applications. They can and do prevent mass spam account creation. The issue in the post is from servers where that ( and all the other features to prevent malicious accounts ), were turned off.

Nutomic commented 2 years ago

if allowed to comment, the new malicious accounts will just spam comments then

We can make it so that comments from new users are not visible to others until approved by an admin or mod.

The issue in the post is from servers where that ( and all the other features to prevent malicious accounts ), were turned off.

The limitations in my previous comment can also be applied to posts/comments from remote servers. Particularly in case we are federating with that server for the first time, or in case there is suddenly a lot of activity from a server that previously had low activity.

kaiyou commented 2 years ago

Automated distributed moderation is hard indeed, and I am pretty sure that any simple standard approach will either be too restrictive - and thus disabled on most instances - or too loose and hurt legitimate users more than spammers. This can be compared to real world issues like transnational regulations, or to other existing problems like fighting email spam.

Solutions to these problems most often require a mix common understanding of shared basic rules, common standards for communicating about these rules, goodwill from most actors to abide by these rules, incentivized by cross-boundary rule enforcement that exclude explicitly malicious actors and penalize laxity. Maintaining and governing such systems is expensive, designing privacy-preserving solutions for automating part of it makes it even more challenging - both in real and digital worlds.

I am not what you would call a libertarian, yet I like the theoritical self-regulating properties of free market and free exchange, which translates pretty much to the current Fediverse approach imo: rely on local human appreciation, cross-boundary discussions when humans feel the need, and do not try to automate or regulate the whole thing too much as people will regulate themselves. Much like free-market though, this discourages small actors from taking on large responsibilities, but it does sound legitimate to me (I would not trust a single admin running a large instance anyway).

Currently Mastodon is one of the most mature Fediverse projects when it comes to moderation, and still it provides only basic features regarding local moderation:

admins have the ability to browse and search for local and remote accounts, filter by criteria including username, domain, email, etc.
admins have the ability to spot infringing contents by looking for related hashtags,
users have the ability to advertise contents as "nsfw",
users have the ability to report contents, and reports are federated to the origin instance,
any local account may be frozen or deleted by local admins, with some latitude for mass actions using the Ruby shell,
any local user may be "silenced" (posts are only visible to followers, nothing is made public) or "suspended" (deleted),
local users may be restricted based on environmental criteria like email address or IP address,
posts can be restricted based on keyword or hashtag.

Additionally regarding global moderation:

remote users may be silenced or suspended, which is generally identical to local moderation actions,
entire remote servers may be silenced or suspended, which basically applies to any past, current and future server users.

These make for a pretty simple yet quite powerful toolset for moderating based on each instance rules. It allows both for very libertarian instances and proper safe spaces. It incentivizes instance admins to enforce some "common" decency without worrying about defining it precisely, which emerges from the community. It does put pressure on instance admins, but one might argue that it makes their responsibilities more explicit.

There still are a couple issues imo, which have not been completely addressed by the Fediverse ecosystem at this point:

governance feels too project-centric from my perspective (except maybe for some efforts around fediverse.party and similar projects), and no standard toolset emerges, which makes teaching the technicalities of moderation quite challenging,
concepts lack definitions in many cases, for instance the subtleties between limited, silenced and suspended accounts on Mastodon require experimenting to fully comprehend,
some form of common standard has not emerged as to what "nsfw" means, which is understandable given definitions depend on context and culture, but still people want to advertise some details about why they insert content warnings, and it would be nice if the community could suggest or standardize at least a subset of generally admitted content families from a safety perspective - which could help users tag their contents but does not imply that all instances will apply the same rules based on it.

On a separate road, Matrix - which wrote some very speculative specifications about distributed reputation systems https://docs.google.com/document/d/1rijGLs9-RJ9Mz2Yk5wWycMbErrCDK1VJkACdUPMN0-M/ - built its moderation tools based on :

local silencing, ban and blocking criteria very similar to Mastodon,
per-room moderation, including a fairly rich ACL system, that maybe could inspire local delegation of moderation to community managers - this is in fact what I discussed some time ago with Prismo authors who tried to design a fully distributed model for subs instead,
external antispam bindings with documented APIs that encourage community projects for managing moderation,
community managed blocklists that anyone is free to follow using said moderation modules or robots.

dessalines commented 2 years ago

This isn't an issue to discuss general moderation ideas in federated networks, most of which we've already dealt with and handled in the past. Its about how to counter a very specific case of trolling: account creation spambots.

Lemmy already has blocking, reporting, account registrations, captchas, banning, and deleting user content... almost everything you've referred to, we already have. The next release will also have image and content purging from the database. We currently don't have a way to easily stop mass spam account creation from connected servers tho.

Nutomic commented 2 years ago

There are two things in particluar which @kaiyou mentioned that seem relevant:

entire remote servers may be silenced or suspended, which basically applies to any past, current and future server users.

This should be feasible once https://github.com/LemmyNet/lemmy/issues/2285 is implemented. So if an instance is blocked, it would also hide all posts, comments, users, communities from that instance.

local users may be restricted based on environmental criteria like email address or IP address,

Could make spamming a bit more difficult, as it would require the attacker to use proxies/vpns. For example, when a local user is banned, their last IP could automatically be blocked from signing up again.

DMTryptamines commented 2 years ago

For example, when a local user is banned, their last IP could automatically be blocked from signing up again.

IP based solutions are becoming less than ideal since some ISPs use carrier grade NAT to give all devices a single IP. Some cost effective ISPs (like the one I use) by default have dynamic IPs and you pay extra for static.

Kap2022 commented 2 years ago

"Make at least one of the 4 account creation protection methods mandatory. Or maybe a subset: like either captcha or account registration must be enabled. "

I like this solution the best - but please make it easy to turn Captcha on by adding a checkbox to the settings page like you have for registration required.

Please limit the requirement to the signup page only - not every time the user logs in.

I prefer Captcha to account registration because it is automated.

While I do not like limits on "untrusted users" because it hurts the new user experience, I do like this idea that could be implemented by itself without the limits on new users:

_"To solve this dilemma, its probably necessary to have some amount of manual reviews for new accounts. This could be implemented with an admin setting to specify which posts/comments need manual admin review before being published, for example:

user has less than x total posts/comments
account is less than x days old
posted an image
account is from a low-activity remote instance"_

Should there be another condition that the instance is less then x days/months old?

I would also suggest blocking the creation of names including "dessalines" and "Nutomic" (except on lemmy.ml) because either you two have severe multiple personalities disorder (some of which are very naughty) or spammers like to impersonate you on other instances.

I agree

"And I also do think that requiring either captcha or account registrations would still be a good thing to do, because then it'd take a manual fork, compile, and complete setup of lemmy to do."

Finally, do not let spammers spoil the concept of federation. Blocking, banning silencing, etc just breaks the federation into non-federated islands.

xximj commented 2 years ago

Hi , I think dynamic rate limits would be cool. This can be edited into a separate hjson file and site admin can create as many scenarios based on how spam threat changes.

Something like :

new_user_scenario {
     triggers: {
          created: <3d
          posts: >=0
     }
     rate_limit: {
          posts: 1
         posts_per_second: 6000
     }
}

PrincessMiki commented 2 years ago

I think what we need is essentially a system which limits posting from new, untrusted users. One example of such a system are trust levels in discourse. We can design something similar for Lemmy, with the difference that it cant be based on the amount of reading, because that isnt federated. So im thinking something like the following:
* new user needs to write at least 5 comments before being able to create posts

* user needs to have made at least 10 comments and 2 posts before being allowed to upload any images

* similar limits for other actions, like sending private messages or posting external links

* account age should also be taken into account, so a new account cant simply spam comments for an hour and be done
This type of system has the advantage of being almost entirely automated. However, it wont work perfectly with federation, because for remote users we rarely know about all of their posts/comments. So its possible that a user makes a popular post on another instance, but our instance doesnt know about the user yet. Based on the above criteria, the post wouldnt be allowed on our local instance, even if the author was already very active on his own instance for a long time.

To solve this dilemma, its probably necessary to have some amount of manual reviews for new accounts. This could be implemented with an admin setting to specify which posts/comments need manual admin review before being published, for example:
* user has less than x total posts/comments

* account is less than x days old

* posted an image

* account is from a low-activity remote instance
The ideal solution would probably be some kind of combination of the above two. But the topic is quite complex, so i wouldnt want to rush and spend a lot of work to implement something which later turns out to not be optimal. For this reason i think it would make sense to first spend some time to investigate in depth how other platforms solve this problem, maybe even read some scientific papers. This research could be a separate NLnet milestone.

Of course all of this will take time. For a short term solution, i think it could help a lot to limit image posting, especially in comments. Because the troll is mostly busy posting shock images in comments, and would have to find something else to post then. In any case, images in comments dont generally add much value in my opinion.

Strongly oppose to this. One of the things I hate about reddit is having to farm 1000 karma to post memes, it was one of the reasons i stopped using reddit.

IMO i think new instances should have some of: captcha, register rate limits, email verification, turned on by default and a warning pop up should appear if an admin tries to disable all of these defenses.

ross-spencer commented 1 year ago

Is there an appropriate place for folks to discuss the scale of this issue? I have setup a new instance of Lemmy and in the last five hours had 1300 activation requests. Fortunately these all then need to verify, so I guess we won't see the problem manifest in the content of the site, but I am a little concerned and not sure what the next steps are.

NB. Are there any links to doc pages about tools to help with this problem? It's not clear to me that I have enough information to perform some of the activities above, e.g. identify and block instances where we are being spammed from, or how to access user listings effectively to identify malicious actors in bulk.

ross-spencer commented 1 year ago

Further to the above, connected to https://github.com/LemmyNet/lemmy/issues/691 is there a timeout for email activation? I had some success limiting user agents in nginx, but now the email_verification table is full. The user count is ridiculously high:

Perhaps if there is no timeout for verification, adding a configurable one could be a partial way to approach tackling spambots?

Additionally, perhaps the person row for the instance can be created only after verification? Right now, each row is about 2000 bytes and the user may never access the instance.

GeckoEidechse commented 1 year ago

With the current wave of spam bot signups it appears that they sign up primarily to good actors that simply have an open registration policy (whether on purpose or by accident).

An interesting approach pixelfed took that targets such a scenario where an inexperienced admin is running an open server (maybe they just wanted to run an instance for them and their friends and don't have much admin experience) is have a default max user limit of 1000.

This way an inexperienced admin just setting up a server without much experience will run their server with the default 1000 user cap, meaning that their instance will cap out at 1000 spam signups.

This of course assumes a good acting instance. A malicious instance can still arbitrarily raise the limit in which case the only solution is to defederate.

GeckoEidechse commented 1 year ago

Another idea that apparently is also used by Matrix is to use one-time codes for sign-up.

By default, an instance would then only allow sign-ups if a valid one-time token is provided. Such a token can only be generated by the instance admin. This sort of setup is ideal in the scenario where an admin spins up an instance for just them and a few friends as in order to sign-up, one needs to have received a token that allows sign-up from the admin.

Again the main idea here is that the currenty infested instances are simply ones that have been set up by inexperienced admins using default settings. As such the default settings should be adjusted so that sign-ups are as restricted as possible with maybe an exception for the initial account that would be created by the instance admin themselves.

cloventt commented 1 year ago

Using the federated modlog as a way to share banned-user identifiers between instances, and maybe the option to automatically follow it's recommendations, is also a good idea I think. I think this could work like:

Bad actor creates spam account on instance A
Instance A admin notices the spam account and bans them from instance A
Instance B is federated with Instance A and sees the banned user's email-address (or some other identifying info) appear in the federated modlog
Instance B automatically takes some action to prevent the same user moving to their instance

This isn't a complete solution, but it would create work for the spammers, because they would be unable to re-use emails on multiple instances, and this would increase the number of fake email accounts they would need to generate.

There are problems that might need to be overcome:

privacy concerns with broadcasting info such as email addresses between instances
- could be resolved by hashing info before sharing it
potential for bad-actor instances to denial-of-service other instances by spamming fake ban messages
- could be resolved by defederation, or choosing not to react to the federated ban list

Instances subscribing to the bans could react in different ways:

Do not react at all
Block future signups of the banned email on their instance
Flag existing users with a matching email on their instance in the admin's UI as "banned on ABC for XYZ" so that the instance admin can more closely monitor their behaviour

This isn't only useful for spam - it could be handy for other ban reasons such as racism. Essentially this would use the federated modlog as a collective knowledge-base of bad actors.

ross-spencer commented 1 year ago

Using the federated modlog as a way to share banned-user identifiers between instances, and maybe the option to automatically follow it's recommendations, is also a good idea I think.

This could work but requires that the spam account names are not being generated randomly. The spam patterns I am seeing for the first 2000 or so looked like about 30 attempts from each IP address (this seemed to correlate with multiple potential user names), and no guarantee that the usernames weren't random:

From the nginx logs: sudo find . -name \*.gz -print0 | sudo xargs -0 zgrep "register" | awk '{print $1}' | sort | uniq -c | sort -n

     ...600+ lines before here... 
     26 ./access.log.3.gz:185.195.233.163
     26 ./access.log.3.gz:185.65.135.245
     26 ./access.log.3.gz:193.32.127.235
     26 ./access.log.3.gz:193.32.248.168
     26 ./access.log.3.gz:198.96.89.195
     26 ./access.log.3.gz:37.19.200.131
     26 ./access.log.3.gz:37.19.221.170
     26 ./access.log.3.gz:66.115.189.190
     26 ./access.log.3.gz:87.249.134.2
     28 ./access.log.3.gz:146.70.117.227
     28 ./access.log.3.gz:146.70.188.131
     28 ./access.log.3.gz:146.70.188.195
     28 ./access.log.3.gz:169.150.196.3
     28 ./access.log.3.gz:173.44.63.67
     28 ./access.log.3.gz:206.217.205.123
     28 ./access.log.3.gz:45.134.212.93
     28 ./access.log.3.gz:68.235.44.35
     30 ./access.log.3.gz:146.70.116.99
     30 ./access.log.3.gz:169.150.203.16
     30 ./access.log.3.gz:206.217.205.121
     30 ./access.log.3.gz:206.217.205.125
     30 ./access.log.3.gz:217.138.252.243
     32 ./access.log.3.gz:103.108.231.51

techno156 commented 1 year ago

This of course assumes a good acting instance. A malicious instance can still arbitrarily raise the limit in which case the only solution is to defederate.

Would defederation be a viable solution in the case of a malicious instance? You'd expect that someone running a malicious instance could just shut down the instance, and fire up a new one if defederated, bypassing defederation altogether.

Nothing really prevents them from just replicating the database into a new instance, effectively copying it without any meaningful changes.

Nutomic commented 1 year ago

Closing this as the discussion is finished. If there is any concrete solution worth implementing, it should be described in a separate issue.