EcohydrologyTeam / ClearWater-riverine

A 2D water quality transporter model to calculate conservative advection and diffusion of constituents from an unstructured grid of flows
MIT License
6 stars 0 forks source link

Clean up LFS #59

Closed sjordan29 closed 5 months ago

sjordan29 commented 6 months ago

When a new user tries to clone the entire repository onto their machine, we run into a git lfs bandwidth error. We need to remove any HDF file that is not used for unit testing.

@aufdenkampe - can you link to the steps you took to delete certain large files from the entire git history when splitting this repository from the original Clearwater repo?

aufdenkampe commented 5 months ago

@sjordan29, to split this repo from the original ClearWater repo that contained everything, I followed the approach outline in:

That resulted in a new repo (this one) in which I deleted everything else.

Do we want to separate out the Ohio River example and it's git history?

If not, a more streamlined approach might be to use the git-rm command to delete out everything we don't want. @ptomasula used this approach recently on another project.

aufdenkampe commented 5 months ago

@sjordan29, I just ran the following steps and commands locally, then committed ec7b797 back to origin. Let's see if this solves the problem.

From @ptomasula.

a nice little trick I came across to untrack files after you've already committed them to a repo, and before you added them to your .gitignore file. For example, in the allocations update, I had committed a bunch binaries and machine specific build files to the repository. As I slowly shift the repo towards a more finalized version, I wanted to remove these files to make it easier for others to work on in the future.

  1. Develop your .gitignore file and commit all changes.
  2. Clean the repo on your local machine using the following command
    git rm -r --cached .
  3. Add everything back to the repo
    git add .
  4. Commit the cleaned repo.
aufdenkampe commented 5 months ago

@sjordan29, when trying to reclone, the above mostly appeared to work, but I'm still getting "Clone failed" response at the very end because:

Error downloading object: examples/data/ohio_river/OhioRiver_m.p22.hdf (90a9d86)
This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

So the good news is that the repo is no longer tracking all the other old HDF files that were committed.

The bad news is that the Git LFS Objects still exist in our remote storage and are counting against our quota. Unfortunately, these can't be deleted without deleting and recreating the repo, which will delete issues, stars, and forks. https://docs.github.com/en/repositories/working-with-files/managing-large-files/removing-files-from-git-large-file-storage#git-lfs-objects-in-your-repository :(

aufdenkampe commented 5 months ago

I just removed LFS by deleting .gitattributes with 053dd819200236e20cc5076ec0fe95a40abcbefa and then by running:

git lfs uninstall
aufdenkampe commented 5 months ago

@sjordan29 and @jrutyna, this should now all be working, with LFS turned off. The one issue is that large files connected to the Ohio River and Sumwere Creek need to be fetched from our servers (as a temporary solution). I mentioned these in new readme files with 532e86cbe15ac5cf96b1832be9909fd2e936c56d