GIScience / oshdb

OpenStreetMap History Data Analysis Framework
https://ohsome.org
GNU Lesser General Public License v3.0
111 stars 18 forks source link

Removing old test-files from git-history #347

Closed rtroilo closed 1 year ago

rtroilo commented 3 years ago

we have same very old and largely test-data files in our git history which lets our repository grow to the current size of 120mb. files in the history like the following could by wiped from the history to reduce our repository size:

I used this command from stackoverflow [1] to find those files

git rev-list --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2 |
  cut -c 1-12,41- |
  $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest

A good tool for wiping files from the git history could be

What do you think about this?

[1] https://stackoverflow.com/questions/10622179/how-to-find-identify-large-commits-in-git-history

joker234 commented 3 years ago

To remove them from master you have to rewrite the whole master history. For a public repo (with releases and forks) this is something which is strongly discouraged, even tough I would like to remove them. I'm pretty torn.

tyrasd commented 3 years ago

I believe the following 3 could be removed without (big) history-rewriting troubles, since they were not (yet) merged into master:

c41cd0fa27d3 5,5MiB oshdb-api/src/test/resources/update-test-data.mv.db c997c5c33936 5,8MiB oshdb-api/src/test/resources/test-update-data.mv.db 6683c395170b 6,0MiB oshdb-api/src/test/resources/test-update-data.mv.db

For the rest… I don't know. The 100MB+ repo size is not great, but rewriting history of the whole project (incl. all branches) is also quite troublesome.

We could just recommend people to create shallow clones when disk usage or slow connections are an issue (e.g. git clone --depth=1 https://github.com/GIScience/oshdb)?

$ git clone --depth=1 https://github.com/GIScience/oshdb
…
$ du -hs oshdb
7.9M    oshdb