Open KOLANICH opened 8 years ago
since we have started to use github releases this can be closed.
Could you strip all the binaries from the history?
This would on one side makes sense but on the other would change all commit ids. This is not a good behavior on a opensource repository as it would mean to do a force push. But I must admit it is a problem, thus reopen the issue.
May be a separate branch?
Or we retire this repo an use a new one with rewritten history.
Just put a message instructing users to rebase their patches manually using git format-patch
and git am
Do you mean rewrite history and give the instructions you mentioned ?
BTW: still lots of not needed MegaBytes in the repo ... https://github.com/boltsparts/BOLTS/tree/b47ae5fb53b4975320867909cfd0de2641f6bf15/output These is even the website. The new (looks exactly like the old) is generated as a BOLTS backend too. Can be found here https://github.com/boltsparts/boltsparts.github.io
Do you mean rewrite history and give the instructions you mentioned ?
Yes, really. Rebasing some patches manually is a minor inconvenience, overbloated repo is a major one.
I am involved in FreeCAD project. In such a project I would never ever think a second about rewriting history of the main repo master branch. But BOLTS in in a situation with no PR ATM and less traffic. We do not have any development or release branches in the repo. You are may be right. We will never ever get a better chance to do it.
I will keep you informed.
bernd
BTW: The cloned repo is 166 MB whereas the real code is still 94 MB and the .git is 71 MB. Means we will not save extremely much.
Ahh ok in downloads are 61 MByte of binary data. I have done BOLTS dev for years and never realized this. I must admit I have seen it just a few seconds before and it disturbs me ...
@johannes: I would probably have done exactly the same 7 years ago with the knowledge I had at that time :-)
OK the code is 33 MB whereas the drawings are 9.5 MB and the website backend is 21.5 MB
Git actually is quite good in avoiding copies and merging similar objects. But yeah, keeping larger files out reduces clone&push times which is great. Unfortunately getting files out requires rewriting history, which means all clones are invalid ....
Anyways, I have. I idea about this project and was probably highlighted by mistake :-) (unsubscribed now, so please don't @ me again)
gave it a try ...
# informations
https://myopswork.com/how-remove-files-completely-from-git-repository-history-47ed3e0c4c35
https://stackoverflow.com/questions/6403601/purging-file-from-git-repo-failed-unable-to-create-new-backup
# command and test
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch path_to_file" HEAD
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch downloads/freecad/BOLTS_FreeCAD_0.2_gpl3.tar.gz" HEAD
# **************************************************************************************************
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch downloads/*" HEAD
rm -rf .git/refs/original
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch output/*" HEAD
rm -rf .git/refs/original
but still .git has 75 MB in file explorer. du even shows 100 MB ...
deleting unreference blobs with
git gc --aggressive --prune=all
makes .git in file manager and by du 65 MB small, means the whole repo ist still 99 MB = 33 MB code and 65 MB .git
pushed it to a new reop on my github ... https://github.com/berndhahnebach/stripedbolts
When I clone this one I have still 33.8 MB code but only 18.1 MB .git = 51.9 MB
LGTM, may be one of you guys can make it even smaller? We probably will never ever get chance again.
Anyways, I have. I idea about this project and was probably highlighted by mistake :-) (unsubscribed now, so please don't @ me again)
sorry johannes. Yes you where highlighted by mistake. The real one would have been @jreinhardt Sorry for the inconvinience.
BTW: We are aware of you have said and we are disscussing if it is worth.
cheers bernd
Hi,
yes, lets do this.
There might be even more to win, when I check the largest blobs in the repo (https://stackoverflow.com/questions/10622179/how-to-find-identify-large-commits-in-git-history), many of those are js files with literal 3d models for 3d.js. This is about 40 MB (but probably compresses quite well in the pack files).
Also when using filter-branch, tags are unaffected and might still reference of big blobs and keep from being garbage collected. So I removed all tags and branches except the main branch.
Anyway, my attempt with
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch downloads/ output/ html/3dviews/*" HEAD
gave me
19M ./.git 54M .
So I guess this is more or less the same than what Bernd achieved...
https://github.com/KOLANICH/strippedbolts
53M . (it was 41, but when I downloaded from the repo, it became 53, likely LFS files have just not been checked out) 18M ./.git
Have put all the images, fonts, compiled translations and FreeCAD zips into LFS. Fonts and zips don't give any noticeable gain, but they are binary, their place is there.
We may want to remove some pngs, since they have the same drawing as in svgs. No noticeable gain, probably were modified too few times, in removing or lfsing other file types.
Though LFS has an extremily large drawback - GH considers it as a driver to sell paid services, so they have quotas on them, and also any uploaded permanently eat the quotas of a parent accounts untill the repo is deleted, loosing all the issues, PRs and forks.
So IMHO it doesn't worth, at least untill changes in M$ policy about LFS.
means we could go for the one on my github.
I'm also for keeping the size of the repository as small as possible. As those download files seem to not correspond with the published releases I'm not sure why they are kept in the first place. Sorry if I get something wrong here.
Also the gh-pages branch seems to be not needed anymore as the websites repository is boltsparts.github.io.
Just for reference this is the current repository master:
bump
found a problem ... I have some branches ... https://github.com/berndhahnebach/BOLTS/branches/all They are not part of the git tree anymore. But most of them have just a few commits, means cherry picking would work. At least not a problem.
stripped this directory too, I have it to delete after website generation anyway to get the webpage up backends/website/static/source/bootstrap-3.2.0/ This gives another 6.3 MB ... 46 MB
If I move the repository to an archive repo, all issues and PRs will be moved too. But we could recreate them if needed and set a link to the Archive repository.
I am curious if more regressions will come up.
to clearly state it is another repo the new repo could be named bolts instead of BOLTS. makes it even easier to put in on a keyboard. Thus a link to and issue would never link to the wrong issue because the new repo will have new issues.
since I move the repo all forkes will still work. After the move I will make a last commit. In a repo README.md I will explain and link to this issue.
The master/main branch of the new repository will be main. This is because of the new guidelines and it states there has something changed.
just tried the repo names are not case sensitive. Thus to get no mix we would need to use a other reponame. I will use boltsparts for the new main stripped BOLTS repo.
links are not broken somehow github seams to know the repo name has changed.
a new BOLTS was born ... https://github.com/boltsparts/boltsparts
I do not close it ATM, see what will happen ...
awesome! Is it possible to transfer issues?
good question,
Don't store large binary files in git repo. Every time you make a commit every file in your repo is copyed. That's why it is 184 MiB now. Use Github Releases to store downloads and use git-lfs to store large files and blobs if they are strictly needed.