bitcoin-dev-project / sim-ln

Payment activity generator for the lightning network
MIT License
63 stars 27 forks source link

Repo is unnecessarily large #158

Open m3dwards opened 10 months ago

m3dwards commented 10 months ago

I noticed while creating branches that it was taking a while and after a quick look it seems it's because the repo is 247mb.

I ran the following commands to list the largest blobs and it looks like some builds were accidentally committed early on:

git rev-list --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2
# for example this blob was added and deleted a minute later
git whatchanged --all --find-object=6507a7347f3b151262807d43af4114d287b0d446

The following is a SO comment and post that discusses techniques for removing blobs from history: https://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-the-git-repository/61602985#61602985

As these files appeared to have been committed and pushed in error I would support their removal from the history.

carlaKC commented 10 months ago

cc @okjodom @sr-gi, I think it's worthwhile doing a once off cleanup?

sr-gi commented 10 months ago

I do agree. It's not worth having an unnecessary big repo because of files that were pushed on an accident

okjodom commented 10 months ago

+1 on cleanup

okjodom commented 10 months ago

What do you think of an interactive rebase to drop PRs #9 and #62 ?

sr-gi commented 10 months ago

That goes over my head git-wise, but I'll be ok with doing so if possible

okjodom commented 10 months ago

having a go at it

okjodom commented 10 months ago

I just experimented with this on a fresh clone of the repo

Interactive rebase to remove commits 72c4f11 then b87a0ae .. 1a75d06, followed by further rewrite to remove associated blobs was my starting step.

git rebase --interactive 4086f94` to drop `b87a0ae` .. `1a75d06` and `72c4f11

For blob clean up, git-filter-repo from the Stack Overflow thread work effectively. From the SO discussion, this tool provides the same capabilities as git filter-branch

This results in blob set 2.blobs.after.txt

whereas before, the list of blobs was 2.blobs.before.txt

I used the original rev-list command to list blobs in repo

git rev-list --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2 | file.txt

From here, I'm not sure how we'd pus this revised history to upstream and get forks, clones, to receive the same.