bee-browser / bee

An HTML5-compliant small browser engine for embedding
Apache License 2.0
4 stars 1 forks source link

.git is large #206

Closed masnagam closed 3 months ago

masnagam commented 3 months ago
$ du -hs .git
5.8G    .git

this project contains many files which are generated by script and these tend to be large in size. especially, lalr parsing tables may be the primaly root cause of this size issue.

masnagam commented 3 months ago

the size of .git in other projects:

project size
linux 5.5G
llvm 2.9G
servo 1.2G
chromium 46G
masnagam commented 3 months ago

download git_find_big.sh and modify it to show the top 100 files. then run it:

All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file.
size   pack  SHA                                       location
17003  683   c4e49ab35a8b967f57a12902f2c1b3ad623c8e2d  packages/jsparser/src/parser/lalr/action.rs
14247  671
13853  875   4ba11dbd5618a09d8e87c4c3aa71e9da027037e2  libs/jsparser/src/parser/lalr/action.rs
13853  875   91eaa66592f0393e08f4eb6134ff00e67c134817  libs/jsparser/src/parser/lalr/action.rs
13853  651   d72890ad160190c7e6c279c67d5630092a2833f6  libs/jsparser/src/parser/lalr/action.rs
10967  461
10728  959   30096f2074c2b0f214e8340e0e04dcaa1da0c3e5  libs/jsparser/src/parser/lalr/goto.rs
10728  959   62220ef291079694ee5c5bddb30c1fa162ebe51d  libs/jsparser/src/parser/lalr/goto.rs
10728  447   9d9acfe10a81a743ea47d2c1f33addc9a18d9bcc  libs/jsparser/src/parser/lalr/goto.rs
10009  425   e1fff7a9d83704ff663136896f719d83b531108d  packages/jsparser/src/parser/lalr/goto.rs
7679   130   38f78584a6b28a3a8a410b0528926750615b1154  libs/htmltokenizer/src/charref/trie.codegen.json
7679   130   38f78584a6b28a3a8a410b0528926750615b1154  libs/htmltokenizer/src/charref/trie.codegen.json
3312   158   9817bdeabb7de9d0075e360e13f24b64ba603961  libs/htmltokenizer/src/charref/trie.rs
3312   109   e273f45c0404c160b462c1b48eed0532e48e4b85  libs/htmltokenizer/src/charref/trie.rs
3312   109   e273f45c0404c160b462c1b48eed0532e48e4b85  libs/htmltokenizer/src/charref/trie.rs
...
masnagam commented 3 months ago

generated files were removed in 8d07638464df486ed7b6854c0705b3674d5a7d12

masnagam commented 3 months ago

the calculation of .git size was completely incorrect because it included the size of submodules.

git git clone git@github.com:bee-browser/bee.git
cd bee
du -hs .git | cut -f1

the result is 16M. it's not so large... but it's larger than mirakc/mirakc (6.2M, 7x commits).

masnagam commented 3 months ago

remove generated files from .git:

git show --pretty="" --name-only --diff-filter=D 8d07638464df486ed7b6854c0705b3674d5a7d12 | \
  xargs git-remove-objects

where git-remove-objects is:

#!/bin/sh

NUM_OBJECTS=$#

BEFORE=$(du -s -B 1 .git | cut -f1)  # in bytes

remove_object() {
  OBJECT=$1

  git filter-branch \
    --index-filter "git rm --ignore-unmatch $OBJECT" \
    --tag-name-filter 'cat' -- --all

  git for-each-ref --format="%(refname)" refs/original/ | \
    xargs -n 1 git update-ref -d

  git reflog expire --expire=now --all

  git gc --prune=now
}

COUNT=1
for OBJECT in $*
do
  echo "[$COUNT/$NUM_OBJECTS] Removing $OBJECT..."
  remove_object $OBJECT
  COUNT=$(expr $COUNT + 1)
done

AFTER=$(du -s -B 1 .git | cut -f1)  # in bytes
DELTA=$(expr $BEFORE - $AFTER)  # in bytes
DELTA_PCT=$(expr $DELTA \* 100 / $BEFORE)

cat <<EOF
$NUM_OBJECTS objects have been removed from .git
Size $BEFORE -> $AFTER, Reduced $DELTA (${DELTA_PCT}%)
EOF
masnagam commented 3 months ago

many generated files (but not all) were removed. the size of .git was reduced to 3.5M (except for submodules).