feature: add a GitHub action to quell spam PRs

CBID2 commented 9 months ago

Problem

I was scrolling through Twitter and someone posted about the spam PR.

Possible Solution

Implement this GitHub action. It'll lessen the workload for maintainers.

wesleytodd commented 9 months ago

I am not sure we need an action for this unless there is an action which does more than that one (like spam detection with ai or something). It is easy to delete/close/lock with as many clicks as it is to comment and we don't need an additional third party action (added security risk and maintenance) to do it right?

iammohan01 commented 9 months ago

It's kinda scary that, Those idiots spamming PR's

wesleytodd commented 9 months ago

I agree and we are actively removing their comments and blocking them. I wish that work was not necessary, and I appreciate y'all working to help with suggestions and feedback. We used to have a pretty active triage team in place, maybe we can revive that to be more active, and if we do you all are welcome to help!

krzysdz commented 9 months ago

unless there is an action which does more than that one (like spam detection with ai or something)

Not sure about AI spam detection, but almost all of these PRs update the readme, have default name (Update filename), change only a single line and have no description, so it should be rather easy to close them automatically.

Most authors of these PRs have a repsitory called "localrepo", so that's another rule that could be used to detect spam from this source (Apna College).

wesleytodd commented 9 months ago

Yeah this is a one-off issue of the day. Spam PRs have been a problem for years and they come in different varieties. This is why I am hesitant do add an action for this as I would rather ask GitHub for better spam management tooling.

TimGels commented 9 months ago

I reported a couple users already for spam. But this is just mopping with the tap wide open. I hate how this impacts contributors their time.

CBID2 commented 9 months ago

Hey @wesleytodd. I found this on X: https://twitter.com/github/status/1311772722234560517 Maybe that could work?

UlisesGascon commented 9 months ago

This is why I am hesitant do add an action for this as I would rather ask GitHub for better spam management tooling.

Agree with @wesleytodd. The issue is quite complex because the moderation in GitHub is not an easy thing:

Many people (those that are watching the repo) will receive a notification in their email or in the app/web once a new PR is created. This will occur even before a Github Action is triggered.
Closing PRs is an easy action (1 click), so that is not a big time consumer
Detecting invalid PRs is not the big issue because you can easily to spot them with the practice.
Reporting issues/users is complex because the UI requires many steps to do it for each user, so that is a blocker for many maintainers.
Nuke button like @CBID2 suggested is great when the community needs to slow down due a discussion/flame in certain moments, but is not the long term solution as it will prevent other users from contributing (PRs) that are legit or need help (issues) while using Express.

So, I think that we are quite limited on how much we can do with GitHub actions in this case. There are other scenarios that are less frequent but more prone to use GitHub Actions to moderate, for example when people do comments that include offensive content or heavily language. In most projects that I am involved the moderation is done by the humans behind the project or a specific team that volunteer to do it, it is a heavy job. The same way as it is hard to keep a slack/discord/gitter community channel a safe space by moderating content.

wesleytodd commented 9 months ago

In most projects that I am involved the moderation is done by the humans behind the project or a specific team that volunteer to do it, it is a heavy job. The same way as it is hard to keep a slack/discord/gitter community channel a safe space by moderating content.

I think historically the express project has not needed the same style of moderation as things like Node.js does. There have been less contentious discussions and mostly the CoC violations have come from folks outside the project so it was relatively simple to ban and move on. I don't think we need to immediately spin up a moderation team but I do think that the Triage team and TC should have the tools to properly moderate. Right now I don't think we have that well in hand. I believe that we can add this to the list of TODOs to address after we can get next weeks TC meeting organized and finished.

Are we in agreement that a GH Action is most likely not the direction we would want to take to solve this problem?

UlisesGascon commented 9 months ago

I believe that we can add this to the list of TODOs to address after we can get next weeks TC meeting organized and finished.

Yes, we can add it to the TODO list and start to work on it in few weeks

Are we in agreement that a GH Action is most likely not the direction we would want to take to solve this problem?

+1 from me

krzysdz commented 9 months ago

Maybe a pull reqest template would make some people think before creating a PR? On the other hand, it doesn't look like those who spam with pull requests will even bother to read it and they'll probably just click "Create pull reqest" with unedited description.

QuantGeekDev commented 9 months ago

I agree and we are actively removing their comments and blocking them. I wish that work was not necessary, and I appreciate y'all working to help with suggestions and feedback. We used to have a pretty active triage team in place, maybe we can revive that to be more active, and if we do you all are welcome to help!

How can we join the triage team?

krzysdz commented 9 months ago

How can we join the triage team?

https://github.com/expressjs/express/blob/master/Contributing.md#becoming-a-triager

wesleytodd commented 9 months ago

Doesn't need to stop the conversation, but since we specifically don't want to use a GHA to do this then this issue is complete. There are more threads to follow up on, but I think we can do that in other discussions more specific to those.

gaurishhs commented 9 months ago

A workflow could be an option too, These spam PRs generally don't have more than 2-3 words. Closing PRs with less than 3 words sounds reasonable. GitHub too does something similar in it's documentation repository. GitHub's workflow

CBID2 commented 9 months ago

A workflow could be an option too, These spam PRs generally don't have more than 2-3 words. Closing PRs with less than 3 words sounds reasonable. GitHub too does something similar in it's documentation repository. GitHub's workflow

@wesleytodd, I think you should reopen this issue. @gaurishhs made another point about using GitHub actions

wesleytodd commented 9 months ago

I do agree that workflow is a bit more well suited IMO. I am still hesitant and I would like to also look at other ways but yeah in the mean time lets re-open the issue so we don't end up having multiple on the same topic or miss out on good ideas like @gaurishhs'.

deepak4566 commented 9 months ago

even one more better thing is maximum spam pr revolve around README.md ,if we can specify to block those pr updating readme for now (upto the spam pr's get less) we can handle this through github actions for this so that good pr's may not effect.

TimGels commented 9 months ago

Is it possible to have the workflow also incorporate a check for new contributors when doing the x words check? I feel like that people who contributed, and already have code merged, in the past do not pose much of a threat in regards to spam. Or am I overthinking it?

CBID2 commented 9 months ago

Is it possible to have the workflow also incorporate a check for new contributors when doing the x words check? I feel like that people who contributed, and already have code merged, in the past do not pose much of a threat in regards to spam. Or am I overthinking it?

I don't think you're overthinking it @TimGels. Previous contributors should be spared in some way.

gaurishhs commented 9 months ago

I think the workflow should be something like this:

The expressjs member check can be omitted / replaced with a collaborator check

Pratik-Kumar-621 commented 9 months ago

I think the workflow should be something like this:

The expressjs member check can be omitted / replaced with a collaborator check

I don't think word count matters because some bugs only requires a minimal changes.

Pratik-Kumar-621 commented 9 months ago

And if word count check is added, they must find some other way to open spam prs. The problem here is those who make spam prs, they just started their development journey and when they got introduced to github (by some youtubers or some articles), they tried to test it by themselves. They didn't have any idea that the thing they are doing is a headace of someone else.

The only way we can stop this is by creating awarness.

CBID2 commented 9 months ago

And if word count check is added, they must find some other way to open spam prs. The problem here is those who make spam prs, they just started their development journey and when they got introduced to github (by some youtubers or some articles), they tried to test it by themselves. They didn't have any idea that the thing they are doing is a headace of someone else.

The only way we can stop this is by creating awarness.

True, but knowing that there those who create spam PRs out of malice, raising awareness is not enough. Protective measures are a must too.

Pratik-Kumar-621 commented 9 months ago

And if word count check is added, they must find some other way to open spam prs. The problem here is those who make spam prs, they just started their development journey and when they got introduced to github (by some youtubers or some articles), they tried to test it by themselves. They didn't have any idea that the thing they are doing is a headace of someone else. The only way we can stop this is by creating awarness.

True, but knowing that there those who create spam PRs out of malice, raising awareness is not enough. Protective measures are a must too.

There are Rest APIs for getting the info of prs and also for actions. Making a bot which can automate spam pr closing using those rest apis can help. But not sure how to detect spam.

JavidSumra commented 9 months ago

Hello! While I may not possess your level of genius, I've got an idea. Can we utilize NLP in Python code to detect significant changes in a Readme file solely through a GitHub Action?

CBID2 commented 9 months ago

I think we should try my suggestion for now and then do more afterwards. We must take some form of action quickly

Romakita commented 9 months ago

Hello team! A quick action could be to limit who can submit a PR while there is no action to filter PR spamming.

github provide a settings to limit temporarly the submitted PR for a determined duration (maybe one week) for an external contributor: https://twitter.com/github/status/1311772722234560517 https://github.com/orgs/community/discussions/22804 (in this case only express team member can submit a PR)

I know isn’t aligned with open source phylosophy but it can stop this PR spamming cycle quickly while waiting for a real solution. Maybe when they see that it no longer works, the followers of the video will ask the YouTuber for explanations :)

Another solution, is to check if a PR have a commit related to an open issue on the same repo. If not the github action xill close automatically the PR.

see you

Kishlay-notabot commented 9 months ago

I really feel pensive seeing all these PRs come from literally "tech youtubers" who don't use basic common senses before uploading a video, I apologize on their behalf 🙏 .

CompeyDev commented 9 months ago

spam detection with ai

This seems like a pretty cool project idea, someone should DEFINITELY put something like this into place to help with spam hell for open-source maintainers.

animesh-chouhan commented 9 months ago

Hello team,

Spent few hours learning about GitHub actions and came up with this rule-based spam PR detection action which automatically closes the PR based on user-configurable rules.

I have setup few rules like detecting the default commit message for README.md and also if the pull request template has been updated with the details. It can be easily extended to prevent this PR havoc and future misuses.

Check it out here: https://github.com/animesh-chouhan/github-spamstop

CompeyDev commented 9 months ago

@animesh-chouhan The action certainly is a decent idea, however I don't think a simple check of whether a default README.md change commit exists in a PR would be enough to prevent such spam. I feel like we should have a greater set of rules to check against, such as how active the said user is on GitHub, whether they've contributed to different repositories, etc.

animesh-chouhan commented 9 months ago

@animesh-chouhan The action certainly is a decent idea, however I don't think a simple check of whether a default README.md change commit exists in a PR would be enough to prevent such spam. I feel like we should have a greater set of rules to check against, such as how active the said user is on GitHub, whether they've contributed to different repositories, etc.

@CompeyDev Yeah it is upto the repository owner to configure rules. I have just provide a template and added basic rules just to give an idea. More complex rules can be added depending on the requirement.

CBID2 commented 9 months ago

So can we get started?

wesleytodd commented 9 months ago

Hey Everyone, lots going on in here but I suggest we take a beat and pause here. I don't think anyone currently on the contributor, triager, or TC roles really want's an action for this (if anyone in those groups disagrees please correct me). Closing PRs from legitimate users is a bad experience, and avoiding that should be a goal which is hard to do with any automated tooling. This idea is more of a "last resort" than an ideal solution.

To update folks on the status:

We added Issue templates and will be adding PR templates which will hopefully warn folks against opening spam prs
We have reached out through more official ways to the video creators about editing the video to reduce the likelyhood of causing accidental spam
Medium term we will likely attempt to get the Triage Team's help for this, but we are not going to do much of that until we sort out a few other things
We are continuing to close and locking the spam PRs

All that said, if folks could please STOP directly mentioning maintainers in these PRs (or in any other fashion) that would be great. We are watching the repos and are well aware without the direct ping's, they are only causing more notification spam.

CBID2 commented 9 months ago

Hey Everyone, lots going on in here but I suggest we take a beat and pause here. I don't think anyone currently on the contributor, triager, or TC roles really want's an action for this (if anyone in those groups disagrees please correct me). Closing PRs from legitimate users is a bad experience, and avoiding that should be a goal which is hard to do with any automated tooling. This idea is more of a "last resort" than an ideal solution.

To update folks on the status:

We added Issue templates and will be adding PR templates which will hopefully warn folks against opening spam prs

We have reached out through more official ways to the video creators about editing the video to reduce the likelyhood of causing accidental spam

Medium term we will likely attempt to get the Triage Team's help for this, but we are not going to do much of that until we sort out a few other things

We are continuing to close and locking the spam PRs

All that said, if folks could please STOP directly mentioning maintainers in these PRs (or in any other fashion) that would be great. We are watching the repos and are well aware without the direct ping's, they are only causing more notification spam.

Sounds wonderful @wesleytodd! :) I'm going to join the triage team to offer more ideas

singhmeharjeet commented 9 months ago

Could Github not implement something like stack overflow? (like you need some amount of rating before you could file a PR on a Repo which is a little old, maybe 10-20 commits)

mydeveloperday commented 9 months ago

I'm an open source contributor on other projects, I stumbled upon a video explaining what was going on. Unfortunately this project has become part of a video to teach people how to make github pull requests to open source projects (https://www.youtube.com/watch?v=Ez8F0nW6S-w), Do as I did and report this video to try and get it taken down. Whilst not harmful content its damaging "misinformation" after which I gave a description why. (if enough people complain then maybe it can be forced to remove it and that might help, but this video has 1.4M views so be prepared for the number of PRs to keep coming in (pretty much forever and fast)

I'm not a github expert maybe there is a way to prevent submission of the PR ahead of time if it doesn't contain a github issue, or maybe a "bug/enhancement/contributor label" , other projects I've worked on you need to be a member of a "virtual" organization (e.g. expressjs community), but you can easily join that organization by raising a ticket, a moderator can adds people to that orginization. Nothing would stop contributors creating their own commits in their own forks, but they couldn't then send you their PRs unless they were part of your organization.

Sorry just trying to help, feel bad for you all.


name: Prevent PR submission from non-organization members

on:
  pull_request:
    types:
      - opened
      - synchronize

jobs:
  validate_pr:
    runs-on: ubuntu-latest
    steps:
      - name: Check if PR author is organization member
        uses: actions/github-script@v4
        with:
          script: |
            const { owner, repo } = context.repo;
            const { login } = context.payload.pull_request.user;

            const { data: orgMembers } = await github.rest.orgs.listMembers({
              org: owner,
            });

            const isMember = orgMembers.some((member) => member.login === login);

            if (!isMember) {
              core.setFailed(`PR author is not an organization member: ${login}`);
            }
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}```

QuantGeekDev commented 9 months ago

I keep getting email notifications so I figured I would contribute something to the conversation. From my reading of the spam PR's, a lot of them add their own name to the commit. Perhaps a regex to detect if the diff consists of only any variation of username, along with the title "Update README.md" would filter at least 60% of the spam PRs. You don't need to use AI for this

CompeyDev commented 9 months ago

Heya! An update on my last comment: I think I'll be working on an AI based spam detector for use-cases like this, which would analyze diff files of PRs and conclude whether it is worthwhile to keep or not. Will update with progress on that here if y'all would like that.

CBID2 commented 9 months ago

Heya! An update on my last comment: I think I'll be working on an AI based spam detector for use-cases like this, which would analyze diff files of PRs and conclude whether it is worthwhile to keep or not. Will update with progress on that here if y'all would like that.

I'd like that

QuantGeekDev commented 9 months ago

Heya! An update on my last comment: I think I'll be working on an AI based spam detector for use-cases like this, which would analyze diff files of PRs and conclude whether it is worthwhile to keep or not. Will update with progress on that here if y'all would like that.

You don't need AI for this. How are you going to approach this?

animesh-chouhan commented 9 months ago

Heya! An update on my last comment: I think I'll be working on an AI based spam detector for use-cases like this, which would analyze diff files of PRs and conclude whether it is worthwhile to keep or not. Will update with progress on that here if y'all would like that.

You don't need AI for this. How are you going to approach this?

I agree with @QuantGeekDev, detecting this kind of spam is relatively easy and using AI would be an overkill. A simple rule-based approach will work well in this case: https://github.com/animesh-chouhan/github-spamstop/blob/main/index.js#L48-L66

This uses a pull-request template which needs to be edited with the details of the change which would easily deter a large amount of such PRs.

PR template: https://github.com/animesh-chouhan/github-spamstop/blob/main/.github/pull_request_template.md

ServerDeveloper9447 commented 9 months ago

Yep. I agree, too. A pull request template would be best to fight these spam pull requests.

Usmanzahidcode commented 9 months ago

I just stumbled upon a tweet about a problem that seems identical to the Binod virus that surfaced a few years ago. It's high time that GitHub takes notice of this new issue. Repositories like these should have a strict set of criteria or credibility requirements for contributors. It's imperative to understand that libraries like these are fundamental building blocks of the web ecosystem, and such issues only hinder progress.

ayush7 commented 9 months ago

Honestly, the only way to qwell this issue is to ask the person who originally posted a github tutorial on youtube to pull down her video and reupload it without the ExpressJS bits in the very end. Otherwise, it may keep happening in the future. Honestly not a big ask. I think the express team should get in contact and ask them to do so. They are a medium size coding tutor platform so it should not be a crazy ask. Their contact and other details are available on their website.

matthewelmer commented 9 months ago

You could probably block 99% of the spam by checking the IP address from which it came. Dunno how to do that, though.

prakhar-pal commented 9 months ago

You could probably block 99% of the spam by checking the IP address from which it came. Dunno how to do that, though.

Blocking PRs based on IP address isn't a solution because 1. Different people are making these spam PRs so IP addresses are different, 2. If you meant IP range that targets some region, then it can result in legit PR getting blocked.

sudo-Mystic commented 9 months ago

A PR template with some automation seems good to me . We can check for the change in description . Most of these student dont care to change description , so if a PR is opened with the template as it is without any change then most probably it is spam and a action or any automation could be used to close it .

neu-ma-tic commented 9 months ago

A way to do this, I think, would be to check diffs and auto-close one without substantial changes

checking the IP address from which it came.

Don't think seeing ips is possible thru github

Feb 11, 2024 1:27:48 AM Rishabh Bohra @.***>:

A PR template with some automation seems good to me . We can check for the change in description . Most of these student dont care to change description , so if a PR is opened with the template as it is without any change then most probably it is spam and a action or any automation could be used to close it .

— Reply to this email directly, view it on GitHub[https://github.com/expressjs/express/issues/5449#issuecomment-1937489364], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AV2UYYYQLESIVVNMLT6SGJTYTCFILAVCNFSM6AAAAABC4M772OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZXGQ4DSMZWGQ]. You are receiving this because you are subscribed to this thread. [Tracking image][https://github.com/notifications/beacon/AV2UYY4TQ2OHBPWHV55C77DYTCFILA5CNFSM6AAAAABC4M772OWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTTPO65I.gif]

expressjs / express

feature: add a GitHub action to quell spam PRs #5449

Problem

Possible Solution