MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.18k stars 19.21k forks source link

[Maintenance] Reduce repository size #13625

Closed p3p closed 5 years ago

p3p commented 5 years ago

Marlins git repository is getting a little on the beefy side, this is an example of how that could be mitigated if @thinkyhead thinks it is worth it. (I'm not a git expert by any means though)

p3p@Zeus ~/workspace $ git clone git@github.com:MarlinFirmware/Marlin.git
Cloning into 'Marlin'...
remote: Enumerating objects: 124, done.
remote: Counting objects: 100% (124/124), done.
remote: Compressing objects: 100% (97/97), done.
remote: Total 171236 (delta 60), reused 38 (delta 25), pack-reused 171112
Receiving objects: 100% (171236/171236), 100.22 MiB | 4.10 MiB/s, done.
Resolving deltas: 100% (109847/109847), done.
Checking connectivity... done.
p3p@Zeus ~/workspace $ cd Marlin
p3p@Zeus ~/workspace/Marlin $ git count-objects -vH
count: 0
size: 0 bytes
in-pack: 171236
packs: 1
size-pack: 104.80 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

This can be mitigated by using gits garbage collection tools:

p3p@Zeus ~/workspace/Marlin $ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Counting objects: 171236, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (158543/158543), done.
Writing objects: 100% (171236/171236), done.
Total 171236 (delta 118068), reused 46777 (delta 0)
p3p@Zeus ~/workspace/Marlin $ git count-objects -vH
count: 0
size: 0 bytes
in-pack: 171236
packs: 1
size-pack: 48.24 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

This reduces the local repository by about 50% but we really need to be cleaning the remote so lets work on a mirror repo that when pushed will overwrite all references.

p3p@Zeus ~/workspace $ git clone --mirror git@github.com:MarlinFirmware/Marlin.git
Cloning into bare repository 'Marlin.git'...
remote: Enumerating objects: 304, done.
remote: Counting objects: 100% (304/304), done.
remote: Compressing objects: 100% (251/251), done.
remote: Total 276718 (delta 105), reused 53 (delta 34), pack-reused 276414
Receiving objects: 100% (276718/276718), 163.63 MiB | 4.06 MiB/s, done.
Resolving deltas: 100% (176812/176812), done.
Checking connectivity... done.
p3p@Zeus ~/workspace/Marlin.git $ git count-objects -vH
count: 0
size: 0 bytes
in-pack: 276718
packs: 1
size-pack: 171.02 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

So the starting point is 171MiB for a repository mirror,

p3p@Zeus ~/workspace/Marlin.git $ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Counting objects: 276718, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (259315/259315), done.
Writing objects: 100% (276718/276718), done.
Total 276718 (delta 192035), reused 77507 (delta 0)
p3p@Zeus ~/workspace/Marlin.git $ git count-objects -vH
count: 0
size: 0 bytes
in-pack: 276718
packs: 1
size-pack: 90.93 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

This simple solution gets us down to 90MiB in the mirror or 50MiB in a normal clone from remote.

But we can go a bit further a quick investigation of the git blobs shows:

....
0dc909c5f4b4  4.4MiB buildroot/share/fonts/marlin-6x12-3.bdf
2f3403dc1092  4.4MiB buildroot/share/fonts/marlin-6x12-3.bdf
025d8fe446fd  4.7MiB .pioenvs/megaatmega2560/lib/libU8glib_ID7.a
72c19a7861b2  4.7MiB Marlin/.pioenvs/mega2560/lib/libU8glib_ID7.a
3fd39a62fd48  6.7MiB zTylers work/Marlin-Release.zip
206f25a8be8f   15MiB Marlin/Marlin.sdf
12025ee6a046   15MiB Marlin/Marlin.sdf

So we have a few old large binaries hanging around in the repository, rewriting git history is bad, but this is upto @thinkyhead whether it is worth it, to clean these binaries from the repository all references to them need removed, to do that we will use bfg the argument choice is just to brute force remove all blobs above 1MiB but protect all current branches from modification so only deleted files will be removed from history.

p3p@Zeus ~/workspace/Marlin.git $ java -jar ~/Downloads/bfg-1.13.0.jar --strip-blobs-bigger-than 1M --protect-blobs-from 1.0.x,1.1.x,bugfix-1.1.x,bugfix-2.0.x .

Using repo : /home/p3p/workspace/Marlin.git/.

Scanning packfile for large blobs: 276718
Scanning packfile for large blobs completed in 1,034 ms.
Found 31 blob ids for large blobs - biggest=16011264 smallest=1062064
Total size (unpacked)=117021876
Found 2571 objects to protect
Found 3 tag-pointing refs : refs/tags/1.0.0-beta, refs/tags/RepRapPro-Huxley-July-2012, refs/tags/RepRapPro-Mendel-June-2012
Found 7755 commit-pointing refs : HEAD, refs/heads/1.0.x, refs/heads/1.1.x, ...

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit 0e3c9e72 (protected by 'bugfix-2.0.x') - contains 3 dirty files : 
    - buildroot/share/atom/avrdude_5.10_linux (1.1 MB)
    - buildroot/share/fonts/NanumGothic.bdf (2.1 MB)
    - buildroot/share/fonts/marlin-6x12-3.bdf (4.4 MB)
 * commit 14193022 (protected by 'bugfix-1.1.x') - contains 1 dirty file : 
    - buildroot/share/atom/avrdude_5.10_linux (1.1 MB)
 * commit dbde4734 (protected by '1.0.x') - contains 9 dirty files : 
    - .pioenvs/megaatmega2560/lib/libU8glib_ID7.a (4.7 MB)
    - .piolibdeps/U8glib_ID7/src/clib/u8g_font_data.c (4.3 MB)
    - ...
 * commit 5d96a6d9 (protected by '1.1.x') - contains 1 dirty file : 
    - buildroot/share/atom/avrdude_5.10_linux (1.1 MB)

WARNING: The dirty content above may be removed from other commits, but as
the *protected* commits still use it, it will STILL exist in your repository.

Details of protected dirty content have been recorded here :

/home/p3p/workspace/Marlin.git/..bfg-report/2019-04-09/01-36-27/protected-dirt/

If you *really* want this content gone, make a manual commit that removes it,
and then run the BFG on a fresh copy of your repo.

Cleaning
--------

Found 35751 commits
Cleaning commits:       100% (35751/35751)
Cleaning commits completed in 5,412 ms.

Updating 7724 Refs
------------------

    Ref                          Before     After   
    ------------------------------------------------
    refs/heads/1.0.x           | dbde4734 | 65c9d2c5
    refs/heads/1.1.x           | 5d96a6d9 | 91cdb59f
    refs/heads/Marlin_RTOS     | 0bdea3f8 | 72c562e8
    refs/heads/bugfix-1.1.x    | 14193022 | bd926c4f
    refs/heads/bugfix-2.0.x    | 0e3c9e72 | d828c8ef
    refs/pull/100/head         | ce07c918 | add7a9d5
    refs/pull/100/merge        | 4ed8076d | a3dec753
    refs/pull/10014/head       | 21a43c8d | 787f981d
    refs/pull/10015/head       | db9e278e | 94cbec56
    refs/pull/10016/head       | cc279a6f | 44b97b52
    refs/pull/10017/head       | 983d22f5 | 4ad6b5cb
    refs/pull/10018/head       | d65162d3 | 41d7315b
    refs/pull/10019/head       | e72b47f1 | 8c6322e5
    refs/pull/10020/head       | 1ba5553b | 47fc0c92
    refs/pull/10021/head       | 3c6d2d77 | 9847f1d8
    ...

Updating references:    100% (7724/7724)
...Ref update completed in 1,077 ms.

Commit Tree-Dirt History
------------------------

    Earliest                                              Latest
    |                                                          |
    DDmDDDDDDDDDDDDDDDDDDDDDDDDmDDDDDDDmmmDDDDDDDDDDDDDDDDDDDDDD

    D = dirty commits (file tree fixed)
    m = modified commits (commit message or parents changed)
    . = clean commits (no changes to file tree)

                            Before     After   
    -------------------------------------------
    First modified commit | fe940a14 | f8d23a94
    Last dirty commit     | 191a4dc7 | 506f2e89

Deleted files
-------------

    Filename                            Git id                                                 
    -------------------------------------------------------------------------------------------
    0_DWIN_ASC.HZK                    | ee8c85ad (2.9 MB)                                      
    25_SPercentage.ICO                | a3b157ee (1.0 MB)                                      
    60_AutoHome.ICO                   | 0f3b25ad (1.0 MB)                                      
    68_AutoLeve.ICO                   | a60fcd4b (1.3 MB)                                      
    ATMega_Pins.png                   | 13ed8717 (1.3 MB)                                      
    Common.png                        | 45f5b6bd (1.3 MB)                                      
    IDE.png                           | c172cca9 (1.3 MB)                                      
    ISO10646-0-3.bdf                  | 287e16b2 (4.2 MB), d9f07573 (4.2 MB)                   
    Marlin Logo.cdr                   | ae515afa (1.3 MB)                                      
    Marlin Logo.pdf                   | d10c6df8 (1.3 MB)                                      
    Marlin Logo.svg                   | c9584aa8 (1.2 MB)                                      
    Marlin Logo_old.cdr               | ae515afa (1.3 MB)                                      
    Marlin-Release.zip                | 3fd39a62 (6.7 MB)                                      
    Marlin.sdf                        | 206f25a8 (15.0 MB), 12025ee6 (15.3 MB)                 
    NanumGothic.bdf                   | 9bd13123 (2.1 MB)                                      
    ...

In total, 51432 object ids were changed. Full details are logged here:

    /home/p3p/workspace/Marlin.git/..bfg-report/2019-04-09/01-36-27

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

p3p@Zeus ~/workspace/Marlin.git $ git reflog expire --expire=now --all && git gc --prune=now --aggressive
Counting objects: 276770, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (259329/259329), done.
Writing objects: 100% (276770/276770), done.
Total 276770 (delta 192241), reused 54991 (delta 0)
p3p@Zeus ~/workspace/Marlin.git $ git count-objects -vH
count: 0
size: 0 bytes
in-pack: 276770
packs: 1
size-pack: 67.46 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

So that gets the repository mirror down to 67MiB down from 171MiB, and a normal remote clone down to 43MiB. There are a few more binaries that could be removed but are protected by the old version branches (as seen in the bfg output) so this could be reduced further. (if you don't protect the 1.0.x branch 60MiB mirror and 35MiB remote clone)

Roxy-3D commented 5 years ago

It is real good to know this. It gives us a number of options.

But my starting position is we keep everything until we are forced to prune the tree. (That doesn't mean we shouldn't fully discuss and debate this. My vote can be changed.)

oysteinkrog commented 5 years ago

Rewriting history will change all commit hashes, invalidating an awful lot of links that are floating around.

p3p commented 5 years ago

rewriting git history is bad

I could have perhaps been a little more specific with the details ^^, This was just an example of the tools available to reduce the download size mentioning the more extreme options at the end, the majority of the size reduction and general cleanup is achieved without having to resort to removing the 'large' unused blobs from history, just using git gc.

thinkyhead commented 5 years ago

According to my understanding, GitHub automatically does pruning and garbage collection on the repositories on their servers. Thus you cannot reach orphan commits. They are gone.

p3p commented 5 years ago

@thinkyhead, this isn't (can't be) true, as you can see from my post above running a normal git gc on a mirror of the Marlin repository halved its size, there is a lot of redundant data.

thinkyhead commented 5 years ago

There seems to be a way to add a GC button to settings. But I can't see how. https://github.com/gitbucket/gitbucket/issues/1212

p3p commented 5 years ago

As I said at the start of this I'm no git / github expert, I'm going to assume that github does not allow a mirror clone to be garbage collected then pushed back up (even though that doesn't change history) so I'l close this and hope that at some point githubs automated gc triggers .. even the git documentation recommends a git gc --aggressive every few hundred commits .. Marlin goes through that in a couple of weeks.

thinkyhead commented 5 years ago

BFG looks great. If only it could update commit IDs on the GitHub meta-data so all the old PR commit references would still work…. then we could use it on the main repo. I've been searching GitHub and so far haven't found any notes on how often they clean repos and how deeply.

github-actions[bot] commented 4 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.