go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
43.41k stars 5.34k forks source link

Spam reporting / flagging #19283

Open fnetX opened 2 years ago

fnetX commented 2 years ago

Feature Description

One of the biggest issues of larger Gitea installs like Codeberg is that there is no easy way to report misbehaviour (like spam and abusive comments) directly in the UI. For Codeberg, we use some workarounds, people contact us via third-party channels (email, mastodon, matrix etc), and we are working on a moderation system that simplifies some workflows for us: https://codeberg.org/Codeberg/moderation

With upcoming federation, this problem likely expands to many more instances. If a user doesn't want to allow registration, but still receive activity (e.g. issuse, pulls) from another instance, they'll likely face the issue of spam sooner or later. I'd even go so far as to say that we can't enable federation until this issue is solved.

Back to our moderation toolbox: We don't think that Gitea should provide full-fledged moderation features as we require (including user warnings, maybe quota enforcement, quarantining, public log etc), and to take that responsibility from the Gitea codebase, we are developing this as a service that hooks into the Gitea database. On the other hand, there are some rudimentary tasks that should probably be covered by Gitea. That is, of course the basic api calls such a tool needs (e.g. #15588). And now I'd like to discuss what else needs to be implemented in Gitea.

My proposal is the following:

While I'd say that this shouldn't be too hard to implement (e.g. create a new table "reports" with issue_id, comment_id, user_id etc and an optional comment and dismissed=0|1 state) for Gitea in the current form (and in fact, we'd likely be available to provide such a solution), I don't know if this system shouldn't better be built with federation in mind, allowing to automatically report content to the origin instance and propagating decisions etc. The way Mastodon (and other fediverse apps) already do it already works fine, but this sounds too complex for us to "quickly implement".

What do you think? What is the way to go here? Should this be built with respect to federation - and thus probably wait until the codebase is better prepared in this regard, because currently I can't find enough references on how this could look like?

Thank you very much for the consideration.

Screenshots

No response

techknowlogick commented 2 years ago

consolidating with https://github.com/go-gitea/gitea/issues/12

fnetX commented 2 years ago

I don't think closing is a good idea, as I see huge difference between both issues: This issue is about a discussion how to collaborate on implementing instance-wide or cross-instance spam fighting of all kind and how to implement this.

I read the other issue before opening this one, and it's more like a concept for self-moderating your own content on another instance, e.g. by blocking users from your projects. Looking at other platforms, you also can't solve the features to allow blocking users with your own preference and a moderation backend for the platform itself in one issue, IMO.

Mikaela commented 2 years ago

I think this also needs user/organization-wide blocking option (https://github.com/go-gitea/gitea/issues/17453) while waiting for an admin to review the report

yozachar commented 11 months ago

image

@go-gitea how do I report this user https://gitea.com/ishadeshpande for spamming my repository's issue tracker?

Link to my repository: https://gitea.com/joe733/linkeeper/issues

I'm thinking about deleting those "issues", but it'd be much better after reporting the user.

techknowlogick commented 11 months ago

@joe733 thanks for the report. I've removed the user.

richmahn commented 9 months ago

I notice a lot of spam users come to put their website in their profile and in their description usually write about being an escort service, consultant, web developer, or a few other "professions". Here is how I find my spam users:

Last logged in the same day they created their account (they have to to set website & description), have set those two fields, and don't have any repos (usually, some do create a repo just to also add a website URL to it).

You'd think we could first discourage this kind of behavior by not allowing website to be set within 24 hours of creating account.

Then we could have some key words to search description and website like "escort" "consulting" etc. and make that customizable in the config file.

lunny commented 9 months ago

Have you enabled login captcha?

techknowlogick commented 9 months ago

@lunny Sadly captcha has limitations when real humans sign up to post spam, although some AI can bypass captcha.

@richmahn What I've been doing for Gitea.com, and have shared my knowledge with many other instances (blender, CB, etc.) is to set up a DB row event trigger to call an HTTP endpoint when certain user information is updated. I did it this way as user information webhooks still need to be implemented in the application yet.

I also set up a webhook for issue creation/modification and run them through Spamassasin, so should anything trigger, then the account is temporarily restricted and an alert is sent to the admin.

I also created the concept of placeholder users/orgs, where instead of deleting a spam user and freeing the username/email, by setting them as a placeholder/reserved then they are unable to re-use the information to sign up again once deteceted.

karolyi commented 7 months ago

@techknowlogick how are you able to feed plain text (non-email format) into spamassassin? your idea sounds feasible to me but the best I could gather is that you need to embed the text into an emulated email format so spamassassin can evaluate it. it heavily relies on the email headers so it probably won't give a reliable score.

I've been getting random registrations (even with using captcha and the required email confirmation) on my gitea instance. Today I've got 4 new registrations, linking some sites where you can download cracked programs. Probably SEO spam, happened many times before. Looking at the source of the registrations, they were IP addresses from Pakistan and India, going through the motions so it wasn't automated. Needless to say, I purged them hours later when I woke up, manually.

The "profile changed" callback would be a really useful mechanism as there might come a time where users will register and then change their profiles to spam ones later on.

I think this will become an increasingly bigger issue over time.

mscherer commented 5 months ago

Since the spam is targetting SEO (most of the time), maybe it is worth to plug any new URI to URIBL ? so spammer URI would be detected and blocked, rather that doing it at registration time ?

I am not sure if URIBL still work and if this is still worthwhile.

karolyi commented 2 months ago

Just an idea here, having deleted countless spamming users:

Gitea should put users that enter any URL-like thing into their description and/or website, on a must-be-approved-manually status. It's why they register.

MichaelHinrichs commented 2 days ago

Since a report button still hasn't been added, here are some instances of spam, to show just how bad this problem is. Look at how absurdly long this list is. Some entire servers are filled with nothing but spam. Hopefully this will motivate someone to give this issue priority.

https://git.deuxfleurs.fr/Tasconnectlogistics https://git.deuxfleurs.fr/willidea https://git.deuxfleurs.fr/james7088 https://git.deuxfleurs.fr/pawlaneau https://git.deuxfleurs.fr/RavanSeo1 https://git.deuxfleurs.fr/mailsdaddy

https://git.ourworld.tf/accidentinjurylawyers6718 https://git.ourworld.tf/frydge4446 https://git.ourworld.tf/bunkbedsstore8841 https://git.ourworld.tf/g28carkeys3626 https://git.ourworld.tf/mymobilityscooters7339

https://code.antopie.org/FuriaS https://code.antopie.org/nikhilofficialtour https://code.antopie.org/linzalamba215 https://code.antopie.org/tuffgear https://code.antopie.org/kumkum https://code.antopie.org/kerry765 https://code.antopie.org/clintonjavery https://code.antopie.org/wellnesscounselingseo https://code.antopie.org/nevastechbc https://code.antopie.org/Nirmala

https://nusaeiwyj.com/gitea/explore/users https://gitjh.fun/explore/users https://gitr.pro/explore/users

http://www.bitcoincrashkurs.de/explore/repos https://git.fuwafuwa.moe/explore/repos https://git.rfnull.com/explore/repos https://git.tg/explore/repos https://gitcrypt.com/explore/repos https://gitdev.ru/explore/repos https://gitea.mkgtu.ru/explore/repos https://gittea.dev/explore/repos https://junzimu.com/explore/repos https://git.jzmoon.com/explore/repos https://git.laser.di.unimi.it/explore/repos https://maroon.ee.ncku.edu.tw:3001/explore/repos https://cusdis.linkown.com/explore/repos https://gitea.kotyczka.ch/explore/repos https://git.monocul.us/explore/repos https://www.neptune.builders/explore/repos https://git.pack.house/explore/repos https://www.rell.ru/explore/repos https://git.ritualsong.works/explore/repos https://git.semeikasite.ru/explore/repos https://sha2git.com/explore/repos http://gitea.shengjunfeng.tech/explore/repos https://shiningon.top/explore/repos https://www.strawberryelk.com/explore/repos https://git.tbaer.de/explore/repos http://togy.top/explore/repos http://www.yfgame.store/explore/repos https://git.mzhang.io/explore/repos http://185.87.111.46:3000/explore/repos https://alice-coders.net/explore/repos

karolyi commented 2 days ago

@MichaelHinrichs, it is this why I've moved away from Gitea to Forgejo, where at least you're able to get emails about new registrations. in order to timely delete them.

Since it annoyed the hell outta me, I started to investigate and I think I now have a working solution that stops the spammers. It is a mixture of IP range banning and useragent checks combined with fail2ban IP level banning (with increasing ban times) and abuseipdb reporting. It is working now, but I'm sure they'll change their tactics over time, when I'll also adjust mine accordingly.

A big part of the spammers are mostly indians/pakistanis, probably they get paid for registrations (they have to solve a captcha and then confirm their email), whereas others seem to be using bots for discovery and then manual registrations again (because of the captcha again). The bots, if not indians/pakistanis, are using VPN services (not TOR), according to the abuseipdb reports.

From what I've seen, I'm starting to suspect that there's a bigger operation at play for these spammers. Then again clearly, they're not the sharpest tool in the shed.

Fingers crossed, since I managed to set up my hand-crafted system for it, I haven't gotten any new spammer registrations. This lasted now for about a week.

This should be done on behalf of everybody who maintains a gitea/forgejo whatever with open registrations. Too bad some people just resell the service and don't care about its cleanliness.