LemmyNet / lemmy

🐀 A link aggregator and forum for the fediverse
https://join-lemmy.org
GNU Affero General Public License v3.0
13.12k stars 865 forks source link

Alternative approval method for new users #2601

Closed Nutomic closed 4 months ago

Nutomic commented 1 year ago

Lemmy software is efficient enough that it could easily scale to 10x the current users. But this is prevented because a major bottleneck exists in the onboarding of new users, which is too complex so that many potential users will be discouraged.

Currently the steps for a new user are something like this:

Heres a different way how new users could be onboarded, which is a much smoother process:


The advantage of this approach is that the new user doesnt have to worry about the approval process, but can directly start interacting. At the same time, admins can review new accounts and effectively stop bots. New users would only have permission to create comments, so abuse would be almost impossible.

The implementation would be like this: Site setting require_application would be changed to an enum user_approval_mode: application|content_review. Tables local_user and comment each need a new column approved. If both local_user.approved and comment.approved are false, the comment is not publicly visible, otherwise its visible. There is a new endpoint ListPendingCommentApprovals, with identical format to ListCommentReports, so that the frontend can be reused. Most API calls get checks so that they can only be used by approved users (upload image, create community, send private message etc).

dessalines commented 1 year ago

I'm not sure if I agree with this, for several reasons:

A much better solution IMO, would be to have the default state where people after applying are "logged in", but not yet approved, with an indicator somewhere in the UI showing them that their status is pending. When they get approved, they get emailed, and the pending status goes away, but they don't have to log in again.

Nutomic commented 1 year ago

I'm not sure if I agree with this, for several reasons:

  • It still requires manual admin approval, and would probably be even more work for admins, since now instead of reading through explicit question and answers, they have to gauge someones unapproved comment and post history.

  • Lets say we want to make sure someone isn't homophobic. If their first few comments are on unrelated topics, we don't have enough info to approve them. Contrast that with questionnaires, where we can have their opinion on record on any given number of questions before approving.

  • What if there's not enough comments to tell whether a user is genuine or not? At what point do admins decide to approve someone? If their first X comments look fine, that's no way to tell whether they are trolls or not, since they could just wait until they're approved.

This is meant as an alternative to the existing registration application functionality, not replace it. So instances can keep using that if they want a more thorough vetting of users, or want to ask certain questions like in your example.

The advantage of this approach is mainly for public servers, where admins dont really need to know anything about the user, except knowing that its not a bot. And what they do is only approve each individual comment, not the user account. The account gets approved automatically after a certain number of comments was approved.

  • This is a completely new and untested way of onboarding afaik. Masto, discord servers, and other platforms do it the application-questionnaire way. We settled on this after trying different things, and it has proven itself to be more effective than other methods, and has stood the test of time.

While its true that this can be bypassed, so can any other check. But the average spammer will not bother, and thats who we are trying to catch. It is also very similar to the way Discourse handles new users.

  • If one reason for this is to filter out bots, as opposed to trolls, this would be worse in that regard: now instead of one obvious bot application, which couldn't read the application questions, there are probably dozens of bot posts and comments.

If this turns out to be a problem, we could apply stricter rate limits to unapproved users, or show only one comment per user for review.

  • The "approval notification via email" process would work the same.

  • Federation might get complicated, because we'd have to have to build and persist a queue of "newly approved but not yet sent" content. IE after an account gets set to "approve", their content needs to get added to a queue, instead of being called on the API endpoints.

Just need to federate the comment after its approved, not difficult.

  • It'd require a whole new queue and UI for "unapproved content".

I would just make an sql query which buillds this queue on the fly from all unapproved comments. And the UI can be copied/abstracted directly from reports.

A much better solution IMO, would be to have the default state where people after applying are "logged in", but not yet approved, with an indicator somewhere in the UI showing them that their status is pending. When they get approved, they get emailed, and the pending status goes away, but they don't have to log in again.

It still means the user has to answer some questions during signup. Look at Reddit, Facebook or Twitter, none of them need that. They get you signed up and posting as soon as possible. Thats what im trying to enable for Lemmy.

dessalines commented 1 year ago

Look at Reddit, Facebook or Twitter, none of them need that.

All of them now require verified emails, which we also have, but its optional. I'd rather lemmy.ml have a policy of federating to either require_verified_email, or require_application_questionnaire servers.

I'm still mostly opposed to this because of

Nutomic commented 1 year ago

Email verification isnt really useful against spam, I remember from hosting peertube.social that most spam bots had verified gmail addresses. With other providers its even easier for a bot to verify email automatically.

The application questionnaire is useful, but doesnt really work for instances which are meant to be public (again, look how easy signup is on Reddit, no need to answer any questions). I dont think it is complicated at all, I already finished half the backend. It can also hardly be considered untested, for example Discourse uses a similar system (though more complex). For admins who are concerned about increased workload, they can simply keep using one of the exsisting registration modes.

By the way this kind of feature is also being requested by users: https://lemmy.ml/post/632344

dessalines commented 1 year ago

Discourse unfortunately seems to be following the stack-overflow model, where you created arbitrary complicated systems of trust / reputation, which then affects what abilities you have. I greatly bet they also couple that with captcha's and verified emails, otherwise they would get a ton of spam signups and content.

You're underestimating how much work this will be. It would need:

There are so many simpler solutions to this problem. Like:

Shifting manual approval from before to after, isn't going to affect any server's ability to reach mass adoption IMO, especially if in the latter their comments are hidden.

poVoq commented 1 year ago

The proposed method is very common on classic forums (I use it on phpBB, Flarum and Discourse) and is pretty much the only way to reliably stop spam accounts.

Verifying by email seem to be fully automated these days for spam bots and does not help at all.

The admin approval of accounts also works, but spam users can lie more easily in the application form than they can on the actual posts. A spam-bot can easily put in random seemingly good answers in the application form, but it can not not post spam in the actual comments/posts as that would defeat the purpose of spamming. And all in all, I think it "hurts" legitimate users more than it does spammers.

poVoq commented 1 year ago

There are so many simpler solutions to this problem. Like:

* Allow unverified users to appear logged in, and show an indicator that they're not approved yet, that goes away when they get approved.

* Allow unverified users to subscribe / unsubscribe only.

That seems like a good step already if people can customize their profile and subscribe even if not approved yet.

Maybe there could be also a "draft" feature that doesn't actually post comments but with which any user can make and schedule posts/comments? I think that was a feature also previously requested and I think it exists on Mastodon as well. One of the scheduling options could be "post when account gets approved" then.

Edit: ofc. that would work better if approving admins are able to see these drafts, which isn't ideal as people probably consider unpublished drafts to be private.

kromonos commented 1 year ago

I know I may be making myself a little unpopular, but what would be wrong with Akismet, for example?
Another thing could be disabling attachment uploads directly after registration for a small amount of time or until a specific amount of "reputation". Same with the profile header image. We could see it as something like a reward system for participation?

Also, an idea could be a "need for approval" for image only posts, since most of the spammers I've seen, only post image spam. This could be automatically unlocked with an amount of "constructive" participation in other threads.

Edit: I love the idea of "Allow unverified users to subscribe / unsubscribe only.". Maybe it's possible to do this based on communities?

GitOffMyBack commented 1 year ago

I am one of the admins at beehaw.org and I'm looking forward to this feature. as @poVoq stated above:

The admin approval of accounts also works, but spam users can lie more easily in the application form than they can on the actual posts.

I just wanted to leave this general feedback for the sake of support/solidarity.

dessalines commented 1 year ago

The proposed method is very common on classic forums (I use it on phpBB, Flarum and Discourse) and is pretty much the only way to reliably stop spam accounts.

phpBB also uses an application questionnaire, just like masto, lemmy, and discord servers. They also have optional captcha and email verify like lemmy.

A spam-bot can easily put in random seemingly good answers in the application form, but it can not not post spam in the actual comments/posts as that would defeat the purpose of spamming.

Really confusing.. the entire purpose of a spambot is to spam their content all over your site.

The admin approval of accounts also works, but spam users can lie more easily in the application form than they can on the actual posts.

Then they'll just make a few innocuous posts at first. No different than making an innocous registration application, except now you've added more work for admins by having to go through a bunch of posts except for one application.

I'm very against adding rules around reputation, and limiting abilities based on that. Every person here so far came up with a different set of rules for what makes sense to them, and all of them are circumventable by malicious humans with time.

dessalines commented 1 year ago

Breaking this down, there are three types of bad actors:

  1. Spambots
  2. Humans with no time (trolls who want to post racist spam quickly)
  3. Humans with a lot of time

A questionnaire stops 1 and 2. It doesn't stop 3.

The "post approval method" stops 1 and 2, and also doesn't stop 3. It also creates a lot more work for admins, and is far more complicated.

poVoq commented 1 year ago

Sorry the double negation was a bit confusing indeed.

Interestingly enough on all the classic forums I have very rarely seen a spammer that first tries to post innocuous posts. They usually go for the spam directly. What I did see though was a seemingly innocuous user start a discussion to make the later spam posts by other users seem more relevant and less spammy (I think that is some sort of semi-automation).

As for the 3 scenarios: a questionnaire does not stop 1 and 2. Spambots can be easily automated to fill these questions with semi-random truthy answers that are difficult to spot by admins and a troll even without time can easily think of some answer similar in quality to what I have seen as answers from legitimate users on my instance.

dessalines commented 1 year ago

Spambots are easily stopped by application questions, its why us and masto use them. We get plenty of bot registrations, none of them can read the questions, parse meaning, and attempt to answer each of them individually in a coherent way.

You can also add things inside the text that bots can't easily do, like "Follow the instructions from this page" (and that page tells them to type something in all caps in their response.) We haven't found this necessary bc bots are simple to spot, but I've seen lots of discord servers do it.

troll even without time can easily think of some answer similar in quality to what I have seen as answers from legitimate users on my instance.

The point is that they don't get to instant-post, so even if they go through the trouble of making an application, they have to wait some time before they get approved. Which puts them in 3, not 2.

poVoq commented 1 year ago

Sure, however the point of this entire thread is that the current method might somewhat work, but it hurts legitimate users (and admins) and thus adoption more than it does the spammers.

But if you can implement what you proposed above to allow non-approved users to already subscribe to communities and so on I think that would help a lot already.

SorteKanin commented 1 year ago

We get plenty of bot registrations, none of them can read the questions, parse meaning, and attempt to answer each of them individually in a coherent way.

With the increasing coherence of chatbots (ChatGPT etc.), this sounds like something that will very soon be a losing battle.

FruityWelsh commented 8 months ago

We get plenty of bot registrations, none of them can read the questions, parse meaning, and attempt to answer each of them individually in a coherent way.

With the increasing coherence of chatbots (ChatGPT etc.), this sounds like something that will very soon be a losing battle.

Still increases the costs. So for now it stops unsophisticated spambots and slows down sophisticated ones.

dessalines commented 4 months ago

Closing this due to the huge amount of complication it would add, but can be re-opened if someone wants to work on it.

dullbananas commented 4 months ago

This should be done in plugins