LemmyNet / lemmy

🐀 A link aggregator and forum for the fediverse
https://join-lemmy.org
GNU Affero General Public License v3.0
13.31k stars 883 forks source link

How to defeat account creation spambots. #2355

Closed dessalines closed 1 year ago

dessalines commented 2 years ago

A few trolls have created lemmy account creation spambots, that take advantage of the fact that:

As a temp fix until we fix this, you can either disable federation, or switch to allowlist federation, and make sure no server in your allowlist has open signups.

This is the only solution I can think of, and probably both of these should be used.

cc @Nutomic

LunaticHacker commented 2 years ago

Make at least one of the 4 account creation protection methods mandatory. Or maybe a subset: like either captcha or account registration must be enabled. ( I like that the best ).

We can't enforce this on malicious actors

Nutomic commented 2 years ago

I think what we need is essentially a system which limits posting from new, untrusted users. One example of such a system are trust levels in discourse. We can design something similar for Lemmy, with the difference that it cant be based on the amount of reading, because that isnt federated. So im thinking something like the following:

This type of system has the advantage of being almost entirely automated. However, it wont work perfectly with federation, because for remote users we rarely know about all of their posts/comments. So its possible that a user makes a popular post on another instance, but our instance doesnt know about the user yet. Based on the above criteria, the post wouldnt be allowed on our local instance, even if the author was already very active on his own instance for a long time.

To solve this dilemma, its probably necessary to have some amount of manual reviews for new accounts. This could be implemented with an admin setting to specify which posts/comments need manual admin review before being published, for example:

The ideal solution would probably be some kind of combination of the above two. But the topic is quite complex, so i wouldnt want to rush and spend a lot of work to implement something which later turns out to not be optimal. For this reason i think it would make sense to first spend some time to investigate in depth how other platforms solve this problem, maybe even read some scientific papers. This research could be a separate NLnet milestone.

Of course all of this will take time. For a short term solution, i think it could help a lot to limit image posting, especially in comments. Because the troll is mostly busy posting shock images in comments, and would have to find something else to post then. In any case, images in comments dont generally add much value in my opinion.

dessalines commented 2 years ago

We can't enforce this on malicious actors

That's true, which is why the 2nd solution I proposed could also be used: on server restarts, DB delete all content from the malicious servers. And I also do think that requiring either captcha or account registrations would still be a good thing to do, because then it'd take a manual fork, compile, and complete setup of lemmy to do. With all the promises from redditors and hackernews users that these forks would be "easy to do", there's been like 3 of them that have tried to do this over the course of 2+ years.

trust levels in discourse

I really dislike complicated systems of limiting interaction, or limited abilities, rather than preventing spam at the door, because you still are allowing some interaction to get through ( if allowed to comment, the new malicious accounts will just spam comments then ). Most importantly, they don't prevent the issue in this post, which is mass spam account creation. Lemmy's current tools for removing and deleting data for small numbers of trolls are working fine... mass spam account creation is the only new thing that's appearing to be a problem.

We already have manual reviews in the form of registration applications. They can and do prevent mass spam account creation. The issue in the post is from servers where that ( and all the other features to prevent malicious accounts ), were turned off.

Nutomic commented 2 years ago

if allowed to comment, the new malicious accounts will just spam comments then

We can make it so that comments from new users are not visible to others until approved by an admin or mod.

The issue in the post is from servers where that ( and all the other features to prevent malicious accounts ), were turned off.

The limitations in my previous comment can also be applied to posts/comments from remote servers. Particularly in case we are federating with that server for the first time, or in case there is suddenly a lot of activity from a server that previously had low activity.

kaiyou commented 2 years ago

Automated distributed moderation is hard indeed, and I am pretty sure that any simple standard approach will either be too restrictive - and thus disabled on most instances - or too loose and hurt legitimate users more than spammers. This can be compared to real world issues like transnational regulations, or to other existing problems like fighting email spam.

Solutions to these problems most often require a mix common understanding of shared basic rules, common standards for communicating about these rules, goodwill from most actors to abide by these rules, incentivized by cross-boundary rule enforcement that exclude explicitly malicious actors and penalize laxity. Maintaining and governing such systems is expensive, designing privacy-preserving solutions for automating part of it makes it even more challenging - both in real and digital worlds.

I am not what you would call a libertarian, yet I like the theoritical self-regulating properties of free market and free exchange, which translates pretty much to the current Fediverse approach imo: rely on local human appreciation, cross-boundary discussions when humans feel the need, and do not try to automate or regulate the whole thing too much as people will regulate themselves. Much like free-market though, this discourages small actors from taking on large responsibilities, but it does sound legitimate to me (I would not trust a single admin running a large instance anyway).

Currently Mastodon is one of the most mature Fediverse projects when it comes to moderation, and still it provides only basic features regarding local moderation:

Additionally regarding global moderation:

These make for a pretty simple yet quite powerful toolset for moderating based on each instance rules. It allows both for very libertarian instances and proper safe spaces. It incentivizes instance admins to enforce some "common" decency without worrying about defining it precisely, which emerges from the community. It does put pressure on instance admins, but one might argue that it makes their responsibilities more explicit.

There still are a couple issues imo, which have not been completely addressed by the Fediverse ecosystem at this point:

On a separate road, Matrix - which wrote some very speculative specifications about distributed reputation systems https://docs.google.com/document/d/1rijGLs9-RJ9Mz2Yk5wWycMbErrCDK1VJkACdUPMN0-M/ - built its moderation tools based on :

dessalines commented 2 years ago

This isn't an issue to discuss general moderation ideas in federated networks, most of which we've already dealt with and handled in the past. Its about how to counter a very specific case of trolling: account creation spambots.

Lemmy already has blocking, reporting, account registrations, captchas, banning, and deleting user content... almost everything you've referred to, we already have. The next release will also have image and content purging from the database. We currently don't have a way to easily stop mass spam account creation from connected servers tho.

Nutomic commented 2 years ago

There are two things in particluar which @kaiyou mentioned that seem relevant:

entire remote servers may be silenced or suspended, which basically applies to any past, current and future server users.

This should be feasible once https://github.com/LemmyNet/lemmy/issues/2285 is implemented. So if an instance is blocked, it would also hide all posts, comments, users, communities from that instance.

local users may be restricted based on environmental criteria like email address or IP address,

Could make spamming a bit more difficult, as it would require the attacker to use proxies/vpns. For example, when a local user is banned, their last IP could automatically be blocked from signing up again.

DMTryptamines commented 2 years ago

For example, when a local user is banned, their last IP could automatically be blocked from signing up again.

IP based solutions are becoming less than ideal since some ISPs use carrier grade NAT to give all devices a single IP. Some cost effective ISPs (like the one I use) by default have dynamic IPs and you pay extra for static.

Kap2022 commented 2 years ago

"Make at least one of the 4 account creation protection methods mandatory. Or maybe a subset: like either captcha or account registration must be enabled. "

I like this solution the best - but please make it easy to turn Captcha on by adding a checkbox to the settings page like you have for registration required.

image

Please limit the requirement to the signup page only - not every time the user logs in.

I prefer Captcha to account registration because it is automated.


While I do not like limits on "untrusted users" because it hurts the new user experience, I do like this idea that could be implemented by itself without the limits on new users:

_"To solve this dilemma, its probably necessary to have some amount of manual reviews for new accounts. This could be implemented with an admin setting to specify which posts/comments need manual admin review before being published, for example:

Should there be another condition that the instance is less then x days/months old?


I would also suggest blocking the creation of names including "dessalines" and "Nutomic" (except on lemmy.ml) because either you two have severe multiple personalities disorder (some of which are very naughty) or spammers like to impersonate you on other instances.


I agree

"And I also do think that requiring either captcha or account registrations would still be a good thing to do, because then it'd take a manual fork, compile, and complete setup of lemmy to do."


Finally, do not let spammers spoil the concept of federation. Blocking, banning silencing, etc just breaks the federation into non-federated islands.

xximj commented 2 years ago

Hi , I think dynamic rate limits would be cool. This can be edited into a separate hjson file and site admin can create as many scenarios based on how spam threat changes.

Something like :

new_user_scenario {
     triggers: {
          created: <3d
          posts: >=0
     }
     rate_limit: {
          posts: 1
         posts_per_second: 6000
     }
}
PrincessMiki commented 2 years ago

I think what we need is essentially a system which limits posting from new, untrusted users. One example of such a system are trust levels in discourse. We can design something similar for Lemmy, with the difference that it cant be based on the amount of reading, because that isnt federated. So im thinking something like the following:

* new user needs to write at least 5 comments before being able to create posts

* user needs to have made at least 10 comments and 2 posts before being allowed to upload any images

* similar limits for other actions, like sending private messages or posting external links

* account age should also be taken into account, so a new account cant simply spam comments for an hour and be done

This type of system has the advantage of being almost entirely automated. However, it wont work perfectly with federation, because for remote users we rarely know about all of their posts/comments. So its possible that a user makes a popular post on another instance, but our instance doesnt know about the user yet. Based on the above criteria, the post wouldnt be allowed on our local instance, even if the author was already very active on his own instance for a long time.

To solve this dilemma, its probably necessary to have some amount of manual reviews for new accounts. This could be implemented with an admin setting to specify which posts/comments need manual admin review before being published, for example:

* user has less than x total posts/comments

* account is less than x days old

* posted an image

* account is from a low-activity remote instance

The ideal solution would probably be some kind of combination of the above two. But the topic is quite complex, so i wouldnt want to rush and spend a lot of work to implement something which later turns out to not be optimal. For this reason i think it would make sense to first spend some time to investigate in depth how other platforms solve this problem, maybe even read some scientific papers. This research could be a separate NLnet milestone.

Of course all of this will take time. For a short term solution, i think it could help a lot to limit image posting, especially in comments. Because the troll is mostly busy posting shock images in comments, and would have to find something else to post then. In any case, images in comments dont generally add much value in my opinion.

Strongly oppose to this. One of the things I hate about reddit is having to farm 1000 karma to post memes, it was one of the reasons i stopped using reddit.


IMO i think new instances should have some of: captcha, register rate limits, email verification, turned on by default and a warning pop up should appear if an admin tries to disable all of these defenses.

ross-spencer commented 1 year ago

Is there an appropriate place for folks to discuss the scale of this issue? I have setup a new instance of Lemmy and in the last five hours had 1300 activation requests. Fortunately these all then need to verify, so I guess we won't see the problem manifest in the content of the site, but I am a little concerned and not sure what the next steps are.

NB. Are there any links to doc pages about tools to help with this problem? It's not clear to me that I have enough information to perform some of the activities above, e.g. identify and block instances where we are being spammed from, or how to access user listings effectively to identify malicious actors in bulk.

ross-spencer commented 1 year ago

Further to the above, connected to https://github.com/LemmyNet/lemmy/issues/691 is there a timeout for email activation? I had some success limiting user agents in nginx, but now the email_verification table is full. The user count is ridiculously high:

image

Perhaps if there is no timeout for verification, adding a configurable one could be a partial way to approach tackling spambots?

Additionally, perhaps the person row for the instance can be created only after verification? Right now, each row is about 2000 bytes and the user may never access the instance.

GeckoEidechse commented 1 year ago

With the current wave of spam bot signups it appears that they sign up primarily to good actors that simply have an open registration policy (whether on purpose or by accident).

An interesting approach pixelfed took that targets such a scenario where an inexperienced admin is running an open server (maybe they just wanted to run an instance for them and their friends and don't have much admin experience) is have a default max user limit of 1000.

This way an inexperienced admin just setting up a server without much experience will run their server with the default 1000 user cap, meaning that their instance will cap out at 1000 spam signups.

This of course assumes a good acting instance. A malicious instance can still arbitrarily raise the limit in which case the only solution is to defederate.

GeckoEidechse commented 1 year ago

Another idea that apparently is also used by Matrix is to use one-time codes for sign-up.

By default, an instance would then only allow sign-ups if a valid one-time token is provided. Such a token can only be generated by the instance admin. This sort of setup is ideal in the scenario where an admin spins up an instance for just them and a few friends as in order to sign-up, one needs to have received a token that allows sign-up from the admin.

Again the main idea here is that the currenty infested instances are simply ones that have been set up by inexperienced admins using default settings. As such the default settings should be adjusted so that sign-ups are as restricted as possible with maybe an exception for the initial account that would be created by the instance admin themselves.

cloventt commented 1 year ago

Using the federated modlog as a way to share banned-user identifiers between instances, and maybe the option to automatically follow it's recommendations, is also a good idea I think. I think this could work like:

  1. Bad actor creates spam account on instance A
  2. Instance A admin notices the spam account and bans them from instance A
  3. Instance B is federated with Instance A and sees the banned user's email-address (or some other identifying info) appear in the federated modlog
  4. Instance B automatically takes some action to prevent the same user moving to their instance

This isn't a complete solution, but it would create work for the spammers, because they would be unable to re-use emails on multiple instances, and this would increase the number of fake email accounts they would need to generate.

There are problems that might need to be overcome:

Instances subscribing to the bans could react in different ways:

  1. Do not react at all
  2. Block future signups of the banned email on their instance
  3. Flag existing users with a matching email on their instance in the admin's UI as "banned on ABC for XYZ" so that the instance admin can more closely monitor their behaviour

This isn't only useful for spam - it could be handy for other ban reasons such as racism. Essentially this would use the federated modlog as a collective knowledge-base of bad actors.

ross-spencer commented 1 year ago

Using the federated modlog as a way to share banned-user identifiers between instances, and maybe the option to automatically follow it's recommendations, is also a good idea I think.

This could work but requires that the spam account names are not being generated randomly. The spam patterns I am seeing for the first 2000 or so looked like about 30 attempts from each IP address (this seemed to correlate with multiple potential user names), and no guarantee that the usernames weren't random:

From the nginx logs: sudo find . -name \*.gz -print0 | sudo xargs -0 zgrep "register" | awk '{print $1}' | sort | uniq -c | sort -n

     ...600+ lines before here... 
     26 ./access.log.3.gz:185.195.233.163
     26 ./access.log.3.gz:185.65.135.245
     26 ./access.log.3.gz:193.32.127.235
     26 ./access.log.3.gz:193.32.248.168
     26 ./access.log.3.gz:198.96.89.195
     26 ./access.log.3.gz:37.19.200.131
     26 ./access.log.3.gz:37.19.221.170
     26 ./access.log.3.gz:66.115.189.190
     26 ./access.log.3.gz:87.249.134.2
     28 ./access.log.3.gz:146.70.117.227
     28 ./access.log.3.gz:146.70.188.131
     28 ./access.log.3.gz:146.70.188.195
     28 ./access.log.3.gz:169.150.196.3
     28 ./access.log.3.gz:173.44.63.67
     28 ./access.log.3.gz:206.217.205.123
     28 ./access.log.3.gz:45.134.212.93
     28 ./access.log.3.gz:68.235.44.35
     30 ./access.log.3.gz:146.70.116.99
     30 ./access.log.3.gz:169.150.203.16
     30 ./access.log.3.gz:206.217.205.121
     30 ./access.log.3.gz:206.217.205.125
     30 ./access.log.3.gz:217.138.252.243
     32 ./access.log.3.gz:103.108.231.51
techno156 commented 1 year ago

This of course assumes a good acting instance. A malicious instance can still arbitrarily raise the limit in which case the only solution is to defederate.

Would defederation be a viable solution in the case of a malicious instance? You'd expect that someone running a malicious instance could just shut down the instance, and fire up a new one if defederated, bypassing defederation altogether.

Nothing really prevents them from just replicating the database into a new instance, effectively copying it without any meaningful changes.

Nutomic commented 1 year ago

Closing this as the discussion is finished. If there is any concrete solution worth implementing, it should be described in a separate issue.