go-gitea / gitea

Git with a cup of tea! Painless self-hosted all-in-one software development service, including Git hosting, code review, team collaboration, package registry and CI/CD
https://gitea.com
MIT License
43.92k stars 5.39k forks source link

Allow "safe" mirroring #14076

Open berezovskyi opened 3 years ago

berezovskyi commented 3 years ago

Description

https://github.com/go-gitea/gitea/issues/6783 introduced a change to mirror the repositories "exactly" so that rebased branches and deleted tags are synchronized. However, I want to mirror some repos in case they get removed and in many cases the removal is done by force pushing an orphan master branch with a README saying that some "open core" product is no longer open source. Is there a chance for Gitea to add an "archival" mirror option that would fail to synchronise destructive master/main branch changes as well as tag removals?

Screenshots

n/a

immanuelfodor commented 3 years ago

This is a real threat even when you use the mirror as sub-module in your own other repo as the mirrored repo loses the sub-module's referenced commit and breaks the build, so you end up with nothing even when you have a mirror :sob:

I also always wanted to know what happens to a Gitea mirror if a source Github repo is deleted, do you have any idea? Is it better in this case? Would the mirroring operation fail and keep the mirror in place?

NunoSempere commented 2 years ago

I'm also interested in this.

lafriks commented 2 years ago

As it's common use case for gitea this could be added as option

42wim commented 2 years ago

I've also been looking into a way to do this and I found 2 options:

I'm testing with the git rev-list method for now with adding it to the reference-transaction hook (the only hook that seems to be run when git remote update is executed). Using this hook below breaks mirroring the repo when a forced update from upstream is detected.

With the other option using logAllRefUpdates and making sure we dont gc gc.pruneExpire, gc.reflogExpireUnreachable we could mirror even when forced updates are done because we're keeping all the references (but keeping more overhead).

My current reference-transaction hook below:

#!/bin/sh
if [ "$1" == "prepared" ]
then
  while read -r line
   do
        #only protect the master / main branch (must be specified which branches to protect)
        ([[ ! "$line" =~ refs\/head\/master$ ]] && [[ ! "$line" =~ refs\/head\/main$ ]]) && continue
        count=$(git rev-list --abbrev-commit $(echo $line|cut -d " " -f 1) ^$(echo $line|cut -d " " -f 2) | wc -l)
        if [ $count -ne 0 ]
        then
                echo $(date) forced update detected in $PWD >> ~/forcedupdates.log
                exit 1
        fi
  done
fi

Any feedback which to implement or possibly better ideas to handle this issue?

Update 20220123 - updated reference-transaction hook above to only match on master/main as other branches can be rebased/force pushed

richex-cn commented 2 years ago

I think we need these features to protect mirror repository safe. Like the rencent faker.js event: https://fakerjs.dev/update.html#i-heard-something-happened-what-s-the-tldr

As @6543 mentioned about "eavel atack". @42wim gave a good solution. It would be better if there was an option to enable/disable force-update on mirrors in gitea.

6543 commented 2 years ago

Uff my misspelling created a new name 😆

James-E-A commented 2 years ago

Yes, the "submodules" feature would be far more useful if the option to mirror them were added.

Gitea already archives every file referred-to by a commit (since the codebase would be nonfunctional without them), so it seems a bit strange that repos referred-to by commits aren't also archived; the codebase would also be nonfunctional without them.

Suwmlee commented 11 months ago

Any progress? #19165 has been closed. @42win's webhook solution seems feasible, but I haven't figured out how to set it up.

Matthieu-LAURENT39 commented 11 months ago

I use Gitea to mirror a lot of repos in case they disappear, and not having force-push prevention defeats the whole purpose, really hope this gets resolved

Suwmlee commented 11 months ago

I have an idea. When synchronizing a local repository with a GitHub repository, we can use the method suggested by @42wim to determine if it is a force update.we can create a backup branch of the local branch if it is and then update from remote branch. we can still synchronize with the GitHub repository while preserving the content before the force update in this way.

Zipdox2 commented 5 months ago

Force pushes are sometimes used to revert a few commits, which isn't necessarily malicious. I propose having a threshold option for how much a force push can change before it gets blocked. Maybe send out an email or notification if the threshold gets exceeded, asking the user whether they want to pull the repo.

Estus-Dev commented 1 month ago

Force pushes are sometimes used to revert a few commits, which isn't necessarily malicious. I propose having a threshold option for how much a force push can change before it gets blocked. Maybe send out an email or notification if the threshold gets exceeded, asking the user whether they want to pull the repo.

I would definitely prefer to create new branches whenever a force push would delete some commits, while still tracking existing branches through the force-pushes.

For example:

  1. Someone force pushes to remove commits from main on the remote server, while the local server still contains the removed commits.
  2. The mirror on local fetches changes from remote, and determines that a force push has occurred.
  3. The current state of local/main is copied to local/main-conflict-2024-07-06T12:34:56.
  4. Then local/main pulls the changes from remote/main.

My usecase is that I want main to continue to reflect the upstream main, even if someone force pushes to overwrite a bad commit here or there. But I also don't want to lose commits to force pushed deletions by bad actors.

My internal git server uses pull mirrors for all my dependencies and references. It doesn't stop becoming version control and become a backup just because I want a safe copy locally. Git is distributed version control for a reason.