git-lfs / git-lfs

Git extension for versioning large files
https://git-lfs.com
Other
12.64k stars 2.01k forks source link

The existing code repository migration lfs volume doubled #5786

Closed oopses closed 4 weeks ago

oopses commented 1 month ago

Some resource files in the existing code repository occupy a large amount of space, so lfs is migrated. Follow lfs migration https://github.com/git-lfs/git-lfs/wiki/Tutorial

https://confluence.atlassian.com/bbkb/moving-git-large-files-to-git-lfs-in-bitbucket-cloud-1236441468.html

After the migration, lfs has been successfully migrated, but the size of the code repository has doubled. Looking at .git, the previous object file still exists, and there is another copy of the lfs object. After executing git reflog and gc, there is no change in push

git reflog expire --expire-unreachable=now --all git gc --prune=now

I have checked some blogs or Q&A, but I haven't found the reason yet: https://forum.gitlab.com/t/gitlab-lfs-migration-doubles-reposize/27151/2

https://community.atlassian.com/t5/Bitbucket-questions/Repository-not-shrinking-after-migration-to-LFS-What-do/qaq-p/908843

https://stackoverflow.com/questions/72641580/why-the-git-repository-size-stays-the-same-after-migration-to-lfs

I want to ask if anyone has migrated before, do you need to use BFG to clean up the git commit records after the migration?

chrisd8088 commented 4 weeks ago

Hey, I'm sorry you're having trouble. Are you able to provide some information about how you are measuring the size of the repository? Do you mean the total disk space used, or something else?

If you can also provide the output of git lfs env (run from inside the repository's working tree) and the series of commands you used to perform the migration, that would be very helpful.

One thing to note is that Git LFS typically caches a copy of recently-used Git LFS objects in the .git/lfs/objects directory, while also populating your working tree with copies the Git LFS objects which correspond to any Git LFS pointer files found in the commit which you have currently checked out. It's therefore possible to see a rough doubling of on-disk total space usage.

You can manage what is stored in the Git LFS cache directory using the git-lfs-prune(1) command. Also, if your operating system supports it, you can deduplicate objects between the working tree and the cache using the git-lfs-dedup(1) command.

Another thing to pay attention to is whether your migration to Git LFS changed all the commits in your history where files matching the filename patterns you specified occur. For instance, if you ran git lfs migrate import --include="*.bin", that will migrate all the *.bin files in commits in your current branch's history, but not those in all branches. For that, you would want to add the --everything option.

oopses commented 4 weeks ago

Hey, I'm sorry you're having trouble. Are you able to provide some information about how you are measuring the size of the repository? Do you mean the total disk space used, or something else?

If you can also provide the output of git lfs env (run from inside the repository's working tree) and the series of commands you used to perform the migration, that would be very helpful.

One thing to note is that Git LFS typically caches a copy of recently-used Git LFS objects in the .git/lfs/objects directory, while also populating your working tree with copies the Git LFS objects which correspond to any Git LFS pointer files found in the commit which you have currently checked out. It's therefore possible to see a rough doubling of on-disk total space usage.

You can manage what is stored in the Git LFS cache directory using the git-lfs-prune(1) command. Also, if your operating system supports it, you can deduplicate objects between the working tree and the cache using the git-lfs-dedup(1) command.

Another thing to pay attention to is whether your migration to Git LFS changed all the commits in your history where files matching the filename patterns you specified occur. For instance, if you ran git lfs migrate import --include="*.bin", that will migrate all the *.bin files in commits in your current branch's history, but not those in all branches. For that, you would want to add the --everything option.

I consulted some other partners and they said that this is to enable lfs for the existing repository, not to initialize the new code repository. After the existing repository is enabled, lfs migration is successful, but the files of historical commit records under git/objects still exist, which doubles the size of the code repository. It is necessary to clean up the historical commit records using tools such as bfg

chrisd8088 commented 4 weeks ago

the files of historical commit records under .git/objects still exist

Ah, I see what you're referring to. You may want to refer to some guides, such as this StackOverflow question, on how to instruct Git to prune objects that are now unreachable after you rewrite your Git commit history with git lfs migrate.

oopses commented 4 weeks ago

the files of historical commit records under .git/objects still exist

Ah, I see what you're referring to. You may want to refer to some guides, such as this StackOverflow question, on how to instruct Git to prune objects that are now unreachable after you rewrite your Git commit history with git lfs migrate.

Yes, I'm working on it.