breck7 / pldb

PLDB: a Programming Language DataBase
https://pldb.io
735 stars 101 forks source link

First 'git pull' is painfully slow #189

Closed breck7 closed 1 year ago

breck7 commented 2 years ago

Probably b/c of the github.pldb.com branch?

breck7 commented 2 years ago

onboarding in general is way too slow and painful

ghost commented 2 years ago

Trying to overhauling the repository as a relatively new contributor, is too risky for me :) #EYE_cantCompetewithBreck

ghost commented 2 years ago

Duplicate of Issue #75

ghost commented 1 year ago

@breck7 : The issue seems to be https://stackoverflow.com/questions/11050265/remove-large-pack-file-created-by-git

The culprit file seems to be .git/objects/pack/pack-b111d543e833814c47944a85bce47cd1221e870e.pack whose size is around 965 MB

This link might also be helpful https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#removing-files-from-a-repositorys-history

Since this involves meddling with the entire project database, you should probably do it.

ghost commented 1 year ago

@breck7 : The following commands should have worked. It does decrease the time required to download, in local test (but does not decrease the repository size) Note: test conducted on git version 2.30 The following commands were executed. `git filter-branch --index-filter 'git rm -r --cached --ignore-unmatch pack-b111d543e833814c47944a85bce47cd1221e870e.pack' --prune-empty

git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin

git reflog expire --expire=now --all

git config pack.windowMemory 10m #only needed if error message error: pack-objects died of signal 9059 is printed before failing.

git config pack.packSizeLimit 20m # only needed if error message error: pack-objects died of signal 9059 is printed before failing.

git gc --aggressive --prune=now

git push --force --all`

The repository is here https://github.com/SRS-WRKS/pldb_repo_minified Please note: It is verified to pass local test and builds successfully.

celtic-coder commented 1 year ago

Very good research Hari (@SRS-WRKS)! Well done! 👍

ghost commented 1 year ago

The best results seem to come from the following, periodic purgin of github.pldb.com branch

` git push origin --delete github.pldb.com

git branch --delete origin/github.pldb.com

git branch --delete github.pldb.com

git branch --delete --remotes origin/github.pldb.com

git fetch origin --prune

git filter-branch --index-filter 'git rm -r --cached --ignore-unmatch pack-0f5756d5527ecda27b539906a9a83974a0d2a213.pack' --prune-empty

git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin

git reflog expire --expire=now --all

git config pack.windowMemory 10m

git config pack.packSizeLimit 20m

git gc --aggressive --prune=now

git push --force --all`

git pull can be completed in a few minutes and the repository size decreases too.

Link in https://github.com/SRS-WRKS/SRS-WRKS-pldb_repo_minified-__1

breck7 commented 1 year ago

Great research @SRS-WRKS ! I'm going to sleep on this one.

breck7 commented 1 year ago

Yikes! 191 seconds for a fresh git pull. Should be >10x faster.

Screen Shot 2022-11-12 at 6 38 03 AM

Going to follow your leads now @SRS-WRKS .

breck7 commented 1 year ago

Ah well I am on a relatively slow Internet connection right now! Thats why mine took so long (I just deleted the github.pldb.com branch, so we should avoid those problems now).

Just tried on a fast cloud machine and got 4 seconds for the first pull! Case closed. Thanks @SRS-WRKS !

Screen Shot 2022-11-12 at 6 42 49 AM