cloud-ark / kubeplus

Kubernetes Operator for multi-instance multi-tenancy
https://cloudark.io/
Apache License 2.0
633 stars 78 forks source link

Repository clean-up #1281

Open chiukapoor opened 1 month ago

chiukapoor commented 1 month ago

Issue

$ curl -s https://api.github.com/repos/cloud-ark/kubeplus | jq '.size' | numfmt --to=iec --from-unit=1024
6.6G
devdattakulkarni commented 1 month ago

Yes, we are aware of this issue. The size has grown over the years. Time to time we have removed unwanted/unused files. Here is the output of git count-objects git count-objects -vH count: 272 size: 22.38 MiB in-pack: 8981 packs: 2 size-pack: 1.44 GiB prune-packable: 0 garbage: 0 size-garbage: 0 bytes

This number is substantially smaller than 6G.

I also looked at the vendor folders in platform-operator and helm-pod. They are about 40 MB each. So they are also not contributing a whole lot to the size.

Maybe the branches are counting towards the size.

I am running 'git gc' now. Let's see if that helps.

devdattakulkarni commented 1 month ago

I think it is the branches. In .git/objects/pack, there is a .pack file which is 6.6G is size. It is all the history of the repository over the years.

We have 130+ branches. The only active branches right now are "develop" and "master". My workflow is to work on develop and then do a PR to master.

So, Option 1: We can delete all the other branches. This can reduce the repo size.

Option 2: Find all the files that are no longer in the master branch and then purge them from other branches. This involves work:

  1. Figure out files that are not present on the master https://stackoverflow.com/questions/28284890/in-git-how-can-i-list-all-files-that-exist-in-branch-a-that-do-not-exist-in-bra

  2. Purge the files https://stackoverflow.com/questions/11050265/remove-large-pack-file-created-by-git

Option 3: Punt this issue for later with the acknowledgment that we will have to fix this eventually. In the documentation, explicitly mention shallow cloning the repository. With shallow clone, the size of the repository is 518M.

Any other option:?

Option 1 is simplest. Ideally, it would be great to know how much of a delta deleting a particular branch will achieve. May be, I can try to delete a branch, then re-clone, and see how much does it reduce the size of the cloned repo.

Option 3 does not rock the boat right now.

Thoughts?

chiukapoor commented 1 month ago

I believe we should go with the Option 1, keeping the only active branches develop and master. We can make the master branch protected so that everyone needs to commit to develop first and later raise a PR for master.

This will also make our master branch stable and develop branch with changes that may be breaking.

Along with this, as I have suggested on Slack we may move independent modules out of this repo such as operator-analysis so that it may have it's own independent releases and development cycles.

devdattakulkarni commented 1 month ago

@chiukapoor I have cleaned up the branches. There are now only 12 branches remaining (including master and develop). The 10 remaining branches (except master and develop) cannot be deleted without further careful analysis. We can do that later.

The size of the repository with the above curl command still shows 6.6G. The .pack file in .git folder is the cause of the size. Can you look into the ways to reduce the size of this file?

chiukapoor commented 1 month ago

RCA

Findings:

Upon researching Git packing, I discovered that the .pack file encompasses both the objects and history of a Git repository.

To identify large files in the Git history, I utilized the following script found on Stack Overflow:

git rev-list --objects --all | \
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
  sed -n 's/^blob //p' | \
  grep -vF --file=<(git ls-tree -r HEAD | awk '{print $3}') | \
  awk '$2 >= 20*2^20' | \
  sort --numeric-sort --key=2 | \
  cut -c 1-12,41- | \
  awk '{ sum += $2; print } END { printf "Total size: %.2f GB\n", sum/(2^30) }'

This script detects blob objects (which represent file contents) larger than 20 MiB across the entire Git history, excluding those currently in the HEAD. It then sorts and presents these files alongside their sizes, concluding with the total size of all the identified large files in the repository's history.

Here are the identified large files:

6cc959a84865   23MiB operator-discovery-helper/operator-discovery-helper
c29d3304c15d   24MiB operator-discovery-helper/operator-discovery-helper
1727a8ac456d   31MiB mutating-webhook-helper/mutating-webhook-helper
9e0c44142215   31MiB mutating-webhook-helper/mutating-webhook-helper
0ab2439526ea   31MiB mutating-webhook-helper/mutating-webhook-helper
519a93cd8f5a   32MiB platform-operator/helm-pod/helm-pod
d7171c80b080   32MiB platform-operator/helm-pod/helm-pod
d96d24be37c9   32MiB platform-operator/helm-pod/helm-pod
4dd2a62aff94   32MiB platform-operator/helm-pod/helm-pod
53286600b665   32MiB platform-operator/helm-pod/helm-pod
83df361707c2   32MiB platform-operator/helm-pod/helm-pod
e8d4c3b52a2f   32MiB platform-operator/helm-pod/helm-pod
1a6b53a50b59   32MiB platform-operator/helm-pod/helm-pod
e419a1e6c69f   32MiB platform-operator/helm-pod/helm-pod
3198ebd4d6d0   32MiB platform-operator/helm-pod/helm-pod
fca54fb09b63   32MiB platform-operator/helm-pod/helm-pod
285fb0983cdf   32MiB platform-operator/helm-pod/helm-pod
20f6b63cdc3c   32MiB platform-operator/helm-pod/helm-pod
cfe95149943e   32MiB platform-operator/helm-pod/helm-pod
27474c111651   32MiB platform-operator/helm-pod/helm-pod
d98771e16a20   32MiB platform-operator/helm-pod/helm-pod
4e5f8e52cf8e   32MiB platform-operator/helm-pod/helm-pod
1b5baf82e863   32MiB platform-operator/helm-pod/helm-pod
0fa8a4b16c1e   32MiB platform-operator/helm-pod/helm-pod
ec8cb44104c9   34MiB platform-operator/platform-operator
504a75825f40   34MiB platform-operator/artifacts/deployment/platform-operator
ba91c6b5f158   34MiB mutating-webhook-helper/mutating-webhook-helper
264a5d68ca65   34MiB platform-operator/platform-operator
2e41fcd8ffe4   34MiB platform-operator/platform-operator
b9a282fa6b46   34MiB platform-operator/platform-operator
ee0a8bbfaf2f   34MiB platform-operator/platform-operator
d02ca5030ea7   38MiB platform-operator/artifacts/deployment/platform-operator-april13
11fa89099370   38MiB deploy/kubectl
60f8ef967e94   39MiB operator-manager/artifacts/deployment/operator-manager
b1c07bd0da9d   39MiB mutating-webhook-helper/mutating-webhook-helper
e0dc2eec405d   39MiB platform-operator/artifacts/deployment/platform-operator
123d05935d23   40MiB platform-operator/helm-pod/helm
9cc647276bc1   40MiB platform-operator/helm-pod/helm
05553b898c78   40MiB platform-operator/artifacts/deployment/platform-operator
0e940ba669d7   41MiB platform-operator/helm-pod/kubectl
e8b37151032c   42MiB kubeplus-kubectl-plugins.tar.gz
9a04045298b4   42MiB kubeplus-kubectl-plugins.tar.gz
3351f0df45e5   42MiB kubeplus-kubectl-plugins.tar.gz
8d6f531437cb   42MiB kubeplus-kubectl-plugins.tar.gz
f5149c996804   42MiB kubeplus-kubectl-plugins.tar.gz
e9bb8ecae3e7   42MiB kubeplus-kubectl-plugins.tar.gz
615a761f73c5   42MiB kubeplus-kubectl-plugins.tar.gz
36ab3a933cd0   42MiB kubeplus-kubectl-plugins.tar.gz
d67dee5d1bdb   42MiB kubeplus-kubectl-plugins.tar.gz
1d91164ed3aa   42MiB kubeplus-kubectl-plugins.tar.gz
88f51fbba870   42MiB kubeplus-kubectl-plugins.tar.gz
79645595d6f7   43MiB kubeplus-kubectl-plugins.tar.gz
83001bf330a1   43MiB kubeplus-kubectl-plugins-latest.tar.gz
c02c3feaeb79   43MiB kubeplus-kubectl-plugins:latest.tar.gz
9610c5f64b2a   43MiB kubeplus-kubectl-plugins:latest.tar.gz
2a2d17bd49b8   43MiB kubeplus-kubectl-plugins-latest.tar.gz
2a2296d01486   43MiB kubeplus-kubectl-plugins.tar.gz
ca9ebf4f31d2   43MiB kubeplus-kubectl-plugins.tar.gz
ce6c2d07f604   43MiB kubeplus-kubectl-plugins-latest.tar.gz
24e829c98595   43MiB kubeplus-kubectl-plugins.tar.gz
8d1588a18332   43MiB kubeplus-kubectl-plugins.tar.gz
c365acdad501   43MiB kubeplus-kubectl-plugins.tar.gz
aa3c6a52a679   43MiB kubeplus-kubectl-plugins.tar.gz
8d0fd33348cd   43MiB kubeplus-kubectl-plugins-latest.tar.gz
7e12204ef71b   43MiB kubeplus-kubectl-plugins.tar.gz
c52b062dc6f7   43MiB kubeplus-kubectl-plugins-latest.tar.gz
8d0a1f70a0ee   43MiB kubeplus-kubectl-plugins-latest.tar.gz
094c577af392   43MiB kubeplus-kubectl-plugins-1.0.4.tar.gz
45fe1c0ebdaf   43MiB kubeplus-kubectl-plugins-1.0.3.tar.gz
458343d1d2ea   43MiB kubeplus-kubectl-plugins-latest.tar.gz
920ddf1b2f6e   43MiB kubeplus-kubectl-plugins-latest.tar.gz
3802ef6e3e63   43MiB kubeplus-kubectl-plugins-latest.tar.gz
ef8127701a0b   43MiB kubeplus-kubectl-plugins-latest.tar.gz
7eaaa2dfcaeb   43MiB kubeplus-kubectl-plugins.tar.gz
288471760cb5   43MiB kubeplus-kubectl-plugins.tar.gz
00ae97ed2bc2   43MiB kubeplus-kubectl-plugins.tar.gz
51dbc3d85ef1   43MiB kubeplus-kubectl-plugins.tar.gz
19e2c22af684   43MiB kubeplus-kubectl-plugins.tar.gz
4e408030e0e5   43MiB kubeplus-kubectl-plugins.tar.gz
51530a192a56   43MiB kubeplus-kubectl-plugins.tar.gz
c541036a7b74   43MiB kubeplus-kubectl-plugins.tar.gz
d279c88b08e0   43MiB kubeplus-kubectl-plugins.tar.gz
6e9d8f09c568   43MiB kubeplus-kubectl-plugins.tar.gz
7b8f3cf2d21b   43MiB kubeplus-kubectl-plugins.tar.gz
e8aaf4174282   43MiB kubeplus-kubectl-plugins.tar.gz
90c9430fc48f   43MiB kubeplus-kubectl-plugins.tar.gz
52529f9b1dcd   43MiB kubeplus-kubectl-plugins.tar.gz
c26ee398448a   43MiB kubeplus-kubectl-plugins.tar.gz
91bac79332c9   43MiB kubeplus-kubectl-plugins.tar.gz
fab110955c03   43MiB kubeplus-kubectl-plugins.tar.gz
9da488f6283b   43MiB kubeplus-kubectl-plugins.tar.gz
c1d46eb577f1   43MiB kubeplus-kubectl-plugins.tar.gz
724a4a4b6908   43MiB kubeplus-kubectl-plugins.tar.gz
fee22d952629   43MiB kubeplus-kubectl-plugins.tar.gz
bc93c331e907   43MiB kubeplus-kubectl-plugins.tar.gz
6b3c68e17943   43MiB kubeplus-kubectl-plugins.tar.gz
8de011f7db4e   43MiB kubeplus-kubectl-plugins.tar.gz
f86d617fd294   43MiB kubeplus-kubectl-plugins.tar.gz
f87526e06940   43MiB kubeplus-kubectl-plugins.tar.gz
9a6b019364d3   43MiB kubeplus-kubectl-plugins.tar.gz
c269be008951   43MiB kubeplus-kubectl-plugins.tar.gz
1507a72e3c33   43MiB kubeplus-kubectl-plugins.tar.gz
b7bea14a2edd   43MiB kubeplus-kubectl-plugins.tar.gz
bec01d901703   43MiB kubeplus-kubectl-plugins.tar.gz
0687db132456   43MiB kubeplus-kubectl-plugins.tar.gz
a65055f67793   43MiB kubeplus-kubectl-plugins.tar.gz
324fd4b57624   43MiB kubeplus-kubectl-plugins.tar.gz
c9b97ec83bba   43MiB kubeplus-kubectl-plugins.tar.gz
29be035a8756   43MiB kubeplus-kubectl-plugins.tar.gz
f4a342d2bb9a   43MiB kubeplus-kubectl-plugins.tar.gz
c23a7bc4f0b8   43MiB kubeplus-kubectl-plugins.tar.gz
98422aa8582d   43MiB kubeplus-kubectl-plugins.tar.gz
14d66637f425   43MiB kubeplus-kubectl-plugins.tar.gz
00d42daa755e   43MiB kubeplus-kubectl-plugins.tar.gz
b0325ff9cde0   43MiB kubeplus-kubectl-plugins.tar.gz
f23279a2b41e   43MiB kubeplus-kubectl-plugins.tar.gz
8fd838ff657e   43MiB kubeplus-kubectl-plugins.tar.gz
1a52874ed110   43MiB kubeplus-kubectl-plugins.tar.gz
d0ff22ce5bf9   43MiB kubeplus-kubectl-plugins.tar.gz
657bcd16613e   43MiB kubeplus-kubectl-plugins.tar.gz
0f5070b1803d   43MiB kubeplus-kubectl-plugins.tar.gz
2866cc9fdfd1   43MiB kubeplus-kubectl-plugins.tar.gz
b6ea2ae7ff33   43MiB kubeplus-kubectl-plugins.tar.gz
c4ca62a7caf3   43MiB kubeplus-kubectl-plugins.tar.gz
d4a9590ff3ab   44MiB platform-operator/helm-pod/helm-pod
f41354015c4d   44MiB platform-operator/helm-pod/helm-pod
ff838549a309   46MiB platform-operator/helm-pod/helm-pod
4a7a7f108001   46MiB platform-operator/helm-pod/helm-pod
5a683d172125   46MiB platform-operator/helm-pod/helm-pod
2e3cd5fa4589   46MiB platform-operator/helm-pod/helm-pod
21cffd7ac942   46MiB platform-operator/helm-pod/helm-pod
e799f05e656c   46MiB platform-operator/helm-pod/helm-pod
840ac9e13d66   46MiB plugins/kubediscovery-linux
07fffa8f7f92   46MiB plugins/kubediscovery-linux
dc06b09c883b   46MiB plugins/kubediscovery-linux
c865295defc5   46MiB plugins/kubediscovery-linux
cadf9c1a3eea   46MiB plugins/kubediscovery-linux
fcc470ff901b   48MiB platform-operator/artifacts/deployment/platform-operator
142ea25a47e4   48MiB deploy/helm
3c030a07d685   49MiB kubeplus-kubectl-plugins-1.0.0.tar.gz
4a78acbb93e4   49MiB kubeplus-kubectl-plugins-1.0.0.tar.gz
e36ea2b943cd   49MiB kubeplus-kubectl-plugins-1.0.0.tar.gz
1a8a285709bd   49MiB kubeplus-kubectl-plugins-1.0.0.tar.gz
8a5c0e5377a5   49MiB kubeplus-kubectl-plugins-1.0.0.tar.gz
818043e7858b   49MiB kubeplus-kubectl-plugins-1.0.2.tar.gz
9e5da1a47fdc   49MiB kubeplus-kubectl-plugins-1.0.0.tar.gz
1e99509de801   49MiB kubeplus-kubectl-plugins-1.0.1.tar.gz
3cd2bb94262a   49MiB kubeplus-kubectl-plugins-1.0.0.tar.gz
32f4ed9a6702   52MiB plugins/kubediscovery-macos
4aa98b45d6f2   52MiB plugins/kubediscovery-macos
92a90f3e9890   52MiB plugins/kubediscovery-macos
fc9a4637555e   52MiB plugins/kubediscovery-macos
e1c1c590b0aa   52MiB plugins/kubediscovery-macos
f42f9e45c5c1   52MiB plugins/kubediscovery-macos
2f968ce58d4c   52MiB plugins/kubediscovery-macos
a7552dc214b7   52MiB plugins/kubediscovery-macos
2edc8400ea3f   52MiB plugins/kubediscovery-macos
916aee293d23   52MiB plugins/kubediscovery-macos
bda4807b9296   52MiB plugins/kubediscovery-macos
2c091c1c3467   52MiB plugins/kubediscovery-macos
9af3216b3b8f   52MiB plugins/kubediscovery-macos
df3ba3066b9c   52MiB plugins/kubediscovery-macos
400ce62b304f   52MiB plugins/kubediscovery-macos
501bd028bf1f   52MiB plugins/kubediscovery-macos
6d064b83ee70   52MiB plugins/kubediscovery-macos
7dd447a08514   52MiB plugins/kubediscovery-macos
94e6e404f21d   52MiB plugins/kubediscovery-macos
2a0148ccded1   52MiB plugins/kubediscovery-macos
427b622953fb   52MiB plugins/kubediscovery-macos
a97d3bab4ef4   52MiB plugins/kubediscovery-macos
ce82c7ab107a   52MiB plugins/kubediscovery-macos
0dff922718dd   52MiB plugins/kubediscovery-macos
7abf91977364   52MiB plugins/kubediscovery-macos
bb751c1e454f   52MiB plugins/kubediscovery-macos
14b589ddd8c8   52MiB plugins/kubediscovery-macos
154b79f3b9e6   52MiB plugins/kubediscovery-macos
160bef440451   52MiB plugins/kubediscovery-macos
ee2c34cbf13f   52MiB plugins/kubediscovery-macos
3ab9733f4f0a   52MiB plugins/kubediscovery-macos
af01e219cfb5   52MiB plugins/kubediscovery-macos
808acb829512   52MiB plugins/kubediscovery-macos
7f6c0c68e82e   53MiB plugins/kubediscovery-linux
dceae80f154a   53MiB plugins/kubediscovery-linux
075b6a7eddfb   53MiB plugins/kubediscovery-linux
6bfcf37b8aac   53MiB plugins/kubediscovery-linux
864d55c4d895   53MiB plugins/kubediscovery-linux
56492e80f906   53MiB plugins/kubediscovery-linux
b7156bf52c10   53MiB plugins/kubediscovery-linux
4f3411a47197   53MiB plugins/kubediscovery-linux
032198a48b71   53MiB plugins/kubediscovery-linux
17cef560d252   53MiB plugins/kubediscovery-linux
55327469b936   53MiB plugins/kubediscovery-linux
c4cc3c64f6c4   53MiB plugins/kubediscovery-linux
b65a3750cd14   53MiB plugins/kubediscovery-linux
56f71541c9cf   53MiB plugins/kubediscovery-linux
15fa0e81f12d   53MiB plugins/kubediscovery-linux
ee3422023df1   53MiB plugins/kubediscovery-linux
9733c5bcb0a7   53MiB plugins/kubediscovery-linux
03302e21fdd7   53MiB plugins/kubediscovery-linux
3a7f7ec33ae5   53MiB plugins/kubediscovery-linux
6700b886ee78   53MiB plugins/kubediscovery-linux
e3add604ebf0   53MiB plugins/kubediscovery-linux
3f2955de655e   53MiB plugins/kubediscovery-linux
ba72f32c7520   53MiB plugins/kubediscovery-linux
858552299f49   53MiB plugins/kubediscovery-linux
22583aef0554   53MiB plugins/kubediscovery-linux
f1fbffdc1f1c   53MiB plugins/kubediscovery-linux
a924826f21c6   53MiB plugins/kubediscovery-linux
01e524733bf7   53MiB plugins/kubediscovery-linux
7bec1d24796f   53MiB plugins/kubediscovery-linux
044a3e5d2a05   53MiB plugins/kubediscovery-linux
35103b334abc   53MiB plugins/kubediscovery-linux
589b59ead533   53MiB plugins/kubediscovery-linux
7c46f3fe5191   53MiB plugins/kubediscovery-linux
31e89dccd03c   53MiB kubeplus-saas-manager-control-center.tar.gz
1374f5e51ea9   53MiB platform-operator/platform-operator
a75c3dc42be1   53MiB platform-operator/artifacts/deployment/platform-operator
e538a26d2135   53MiB kubeplus-kubectl-plugins.tar.gz
0580caf232a3   53MiB kubeplus-kubectl-plugins.tar.gz
a2a519ce93ce   53MiB kubeplus-kubectl-plugins.tar.gz
0b94db198af8   53MiB kubeplus-kubectl-plugins.tar.gz
9dfaecda3f0f   54MiB platform-operator/artifacts/deployment/platform-operator
c7a607202b2e   54MiB kubeplus-kubectl-plugins.tar.gz
7d0f8ca5b076   57MiB operator-deployer/artifacts/deployment/operator-deployer
9c7557c3efde   60MiB plugins/kubediscovery-macos
ec12c03b6f43   60MiB plugins/kubediscovery-macos
7790521df6e2   60MiB plugins/kubediscovery-macos
44b6f05f08d0   60MiB plugins/kubediscovery-macos
331e52927f7e   60MiB plugins/kubediscovery-macos
27dfff50acc7   60MiB plugins/kubediscovery-macos
51325de9a645   60MiB plugins/kubediscovery-macos
10a32b460c5e   60MiB plugins/kubediscovery-macos
3c1037693132   60MiB plugins/kubediscovery-macos
4112834a36d3   60MiB plugins/kubediscovery-macos
87b520927c2c   60MiB plugins/kubediscovery-macos
b7b09d906aa3   60MiB plugins/kubediscovery-macos
f848a1661011   60MiB plugins/kubediscovery-macos
12b684726ec7   60MiB plugins/kubediscovery-macos
09b4f8bef003   61MiB plugins/kubediscovery-linux
465e8875ce9a   61MiB plugins/kubediscovery-linux
9343f0ef3a5f   61MiB plugins/kubediscovery-linux
46445693320a   61MiB plugins/kubediscovery-linux
f740ec6e5417   61MiB plugins/kubediscovery-linux
5e3534f358b4   61MiB plugins/kubediscovery-linux
76e8a9df775d   61MiB plugins/kubediscovery-linux
13451705dc2c   61MiB plugins/kubediscovery-linux
1cd15dc10b22   61MiB plugins/kubediscovery-linux
f4beb877f893   61MiB plugins/kubediscovery-linux
380eac8ccd04   61MiB plugins/kubediscovery-linux
c6a08cf7d19e   61MiB plugins/kubediscovery-linux
eb4fe34d38a8   61MiB plugins/kubediscovery-linux
842bdeb6684b   61MiB plugins/kubediscovery-linux
013c419d1c1f   62MiB kubeplus-kubectl-plugins.tar.gz
3d85f5affb3a   62MiB kubeplus-kubectl-plugins.tar.gz
854a9ea40711   64MiB kubeplus-kubectl-plugins.tar.gz
06ea3a0b774d   64MiB kubeplus-kubectl-plugins.tar.gz
654eefc253c5   64MiB kubeplus-kubectl-plugins.tar.gz
6f3b47f1e4e4   65MiB kubeplus-kubectl-plugins.tar.gz
589ec8fb7189   65MiB kubeplus-kubectl-plugins.tar.gz
76732838cee0   65MiB kubeplus-kubectl-plugins.tar.gz

Solution

Clean-up:

To address this issue, the outdated and unwanted objects such as old binaries will be removed using BFG, as suggested on Stack Overflow (PS: I have tested this locally and the .pack file size is down to less than 300 MiB)

Fix:

To prevent this issue in the future, it's important to refrain from uploading binary files to the git repository. Instead, GitHub's release and tags feature and automated CI can be utilized. https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository

devdattakulkarni commented 1 month ago

@chiukapoor Great work! I will go through these steps in the next couple of days.

devdattakulkarni commented 1 month ago

Looking at the files, we do want to keep following files in the repo:

We can remove rest of the files. For above two files, whenever there is an update to the files, we can follow the practice of deleting the current version, and then adding the new version. That way, there will always be a single version of these files present in the repo.

Also, looks like there is another approach to remove files from history https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository

I will experiment with both (bfg cleaner and git filter-repo) in coming days.