dtarb / TauDEM

Terrain Analysis Using Digital Elevation Models (TauDEM) software for hydrologic terrain analysis and channel network extraction.
http://hydrology.usu.edu/taudem
Other
222 stars 115 forks source link

Remove the big files from git history #225

Closed ghost closed 3 years ago

ghost commented 3 years ago

TestSuite files do not exist in the current branch, but they still do exist in the Git history, causing users to download around 900 MB of data. The suggestion is to run bfg tool to prune them permemantly from TauDEM Git history. bfg tool is the suggested tool by GitHub -- https://docs.github.com/en/github/authenticating-to-github/removing-sensitive-data-from-a-repository.

In my case, this reduced the repository size to around 2 MB in my forked copy of TauDEM. Note that the bfg tool rewrites the entire git history without creating a commit.

Steps to reproduce:

  1. Download the bfg tool (https://rtyley.github.io/bfg-repo-cleaner/). This tool requires Java runtime:
wget https://repo1.maven.org/maven2/com/madgag/bfg/1.14.0/bfg-1.14.0.jar
  1. Clone the TauDEM repository using --mirror parameter. This is important as it will download the bare repository of TauDEM.
git clone --mirror https://github.com/dtarb/TauDEM.git
  1. Change directory to the TauDEM directory and then delete the TestSuite files from the git history using the tool. In my case, bfg tool is downloaded at the directory ~/Downloads:
java -jar ~/Downloads/bfg-1.14.0.jar --delete-folders TestSuite
java -jar ~/Downloads/bfg-1.14.0.jar --delete-folders TestSuite_Geographic
  1. Let the bfg tool update the Git history with the deleted files.
git reflog expire --expire=now --all && git gc --prune=now --aggressive
  1. Push the changes to the remote TauDEM repository. This will rewrite the Git history of the remote repository.
git push
dtarb commented 3 years ago

@ayild I tried to do this. Not successfully. At the git push step (step 5) I got a lot of errors like

! [remote rejected] master -> master (protected branch hook declined) ! [remote rejected] refs/pull/100/head -> refs/pull/100/head (deny updating a hidden ref) ! [remote rejected] refs/pull/103/head -> refs/pull/103/head (deny updating a hidden ref) ! [remote rejected] refs/pull/105/head -> refs/pull/105/head (deny updating a hidden ref)

There were some updates that occurred and now the repository wants to create pull requests based on the pushes that did occur. I am not sure how to sort this out. I think the size may not have been reduced. If you are able to help, please do so.

artulab commented 3 years ago

@dtarb I got the same error on your repository. it is probably because your master branch is protected, so Git doesn't allow force push. You probably need to turn off the protection on that branch. Maybe this would fix your issue?

Steps : --> Settings -->Branches --> Tick Allow force pushes
                Permit force pushes for all users with push access.

you can disable it again after this operation.

By the way, for your information: It seems like, after this operation, everyone who cloned your repository will require to do fresh clone of TauDEM.

dtarb commented 3 years ago

I did the push with force pushes allowed. git reported the same errors, but seems to have had an effect as the repository size is now smaller. I am going to call this done, but @ahmetartu if you find any problems, please let me know.