falcosecurity / test-infra

Falco workflow & testing infrastructure
https://prow.falco.org
Apache License 2.0
31 stars 109 forks source link

Approved label not automatically added to kernel-crawler repo #1118

Open FedeDP opened 1 year ago

FedeDP commented 1 year ago

Describe the bug

I have to manually add the approved label to kernel-crawler PRs.

How to reproduce it

Approve any PR on kernel-crawler repo :)

Expected behaviour

The approved label should be automatically added.

With @maxgio92 we discovered that there is some issue with the git client:

hook-7bb6858466-gzdrx hook {"client":"git","component":"hook","count":1,"error":"running \"/usr/bin/git\" [clone --mirror github.com/falcosecurity/kernel-crawler /tmp/git65590195/falcosecurity/kernel-crawler.git] returned error exit status 128 with output \"Cloning into bare repository '/tmp/git65590195/falcosecurity/kernel-crawler.git'...\\nfatal: fetch-pack: invalid index-pack output\\n\"","file":"k8s.io/test-infra/prow/git/git.go:580","func":"k8s.io/test-infra/prow/git.retryCmd","level":"debug","msg":"Retrying, if this is not the 3rd try then this will be retried.","severity":"debug","time":"2023-02-03T15:09:11Z"}

I think it is because the kernel-crawler repo hosts the huge json lists for the kernels for supported architectures and we need to somehow tweak the git client config.

poiana commented 1 year ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

FedeDP commented 1 year ago

/remove-lifecycle stale

FedeDP commented 1 year ago

/remove-lifecycle stale

FedeDP commented 1 year ago

So, after we removed the kernels branch, the git clone --mirror being made by hook plugin is now at ~75MB:

git clone --mirror https://github.com/falcosecurity/kernel-crawler
Clone nel repository spoglio 'kernel-crawler.git' in corso...
remote: Enumerating objects: 2361, done.
remote: Counting objects: 100% (571/571), done.
remote: Compressing objects: 100% (275/275), done.
remote: Total 2361 (delta 294), reused 375 (delta 229), pack-reused 1790
Ricezione degli oggetti: 100% (2361/2361), 75.29 MiB | 5.02 MiB/s, fatto.
Risoluzione dei delta: 100% (1383/1383), fatto.

Unfortunately, this is not enough for it to work. In a kernel-crawler fork, i tried to rewrite entire main history removing big jsons (since at the start of the repo, they were pushed to the main branch); and now the repo size is ~2MB:

git clone --mirror https://github.com/fededp/kernel-crawler
Clone nel repository spoglio 'kernel-crawler.git' in corso...
remote: Enumerating objects: 1875, done.
remote: Counting objects: 100% (517/517), done.
remote: Compressing objects: 100% (357/357), done.
remote: Total 1875 (delta 193), reused 476 (delta 159), pack-reused 1358
Ricezione degli oggetti: 100% (1875/1875), 2.10 MiB | 3.06 MiB/s, fatto.
Risoluzione dei delta: 100% (1058/1058), fatto.

. So, i think we'll need to force push main branch. /cc @leogr @maxgio92 @LucaGuerra

I will need the help of an admin to do so.

FedeDP commented 1 year ago

Using git, it will be done like so:

git filter-branch -f --index-filter "git rm -rf --cached --ignore-unmatch kernels/" HEAD
git push origin HEAD --force-with-lease
FedeDP commented 1 year ago

So, the work was done; see for example this main commit: https://github.com/falcosecurity/kernel-crawler/commit/4438279c57e9931129a16e3b124275fe62ef443f

But the end result is a bit different than expected, sizing around ~45MB:

git clone --mirror https://github.com/falcosecurity/kernel-crawler
Clone nel repository spoglio 'kernel-crawler.git' in corso...
remote: Enumerating objects: 2684, done.
remote: Counting objects: 100% (2684/2684), done.
remote: Compressing objects: 100% (830/830), done.
remote: Total 2684 (delta 1718), reused 2543 (delta 1671), pack-reused 0
Ricezione degli oggetti: 100% (2684/2684), 45.87 MiB | 5.06 MiB/s, fatto.
Risoluzione dei delta: 100% (1718/1718), fatto.

Investigating.

FedeDP commented 1 year ago

For reference, a git clone --mirror of libs accounts for ~35MB. There are no tags nor stale branches in kernel-crawler with big jsons; i don't really get this.

FedeDP commented 1 year ago

So, it seems jsons are still referenced somewhere (of course):

git rev-list --objects --all | grep json
177bdb29d7cb55dd4505fcad7a0007bfe61856d0 kernels/aarch64/list.json
965ad4147108c5029395e898ccf461ba041c005b kernels/x86_64/list.json
7b28341c4ac53ab817104075cd62af1b2f871ce0 kernels/aarch64/list.json
7cc73e95ff22468866340e0506d64b228a53f696 kernels/x86_64/list.json
7ada52e6b02702dae8e1204b4e45ebde01514c93 kernels/aarch64/list.json
9de80130013a3020e23ead46f53a5df954d7eebe kernels/x86_64/list.json
9fe2739e52952181a2cbc054266038c0db658ae6 kernels/aarch64/list.json
bc7c078463e2516c741ab3e8a281e4a539972017 kernels/x86_64/list.json
c4fc6ccbdaf699db51513cb7cfbd53c2eac3395f kernels/aarch64/list.json
03e2a731e31f7cbbb61160a09003c76c09556c37 kernels/x86_64/list.json
0cc64ddcb7091c05cc921436c56a3db355b57027 kernels/aarch64/list.json
e16424d453886c6537521b23cc9505aa467115cb kernels/x86_64/list.json
14de7d6c9a1fbb633ed261e72ba7df0b0d00131a kernels/aarch64/list.json
941fa5ff86ac237aac7a4d9862e47f794bc6aaaf kernels/x86_64/list.json
36a173cd61841f05f1b1f98f8006219ffe20678a kernels/aarch64/list.json
97b58cc0aaf6e5c2a359d41fed9d6ddf96722f12 kernels/x86_64/list.json
79ba50511cae9941f6ec7292523603f4fb2ede93 kernels/aarch64/list.json
d7ef54205c135ca8608a631d532771b76034b24e kernels/x86_64/list.json
a972a51e83371e047ef3d3557ccae586659453b5 kernels/aarch64/list.json
614b68865cd93c50abb357d5be412fc10e54fb17 kernels/x86_64/list.json
388dbcaf19db2bc1b45d124d82ee5295d989979d kernels/aarch64/list.json
47c06f73737b0958972257903cee5af8585055b1 kernels/x86_64/list.json
2055de08b7c0a34d68f71d0c89dde61a5c7c339d kernels/aarch64/list.json
c1eac473010235d8648b058e1610ee791ad99919 kernels/x86_64/list.json
51ddcc04923a4cc6de5ff0b9bbed481f46969afb kernels/aarch64/list.json
fd4af955206c2d04cd62199fab8c23f026d284f6 kernels/x86_64/list.json

These are not commits though (eg:

git log -1 --decorate 2055de08b7c0a34d68f71d0c89dde61a5c7c339d

nothing is shown).

EDIT:

LC_ALL="C" git for-each-ref --contains 2055de08b7c0a34d68f71d0c89dde61a5c7c339d
error: object 2055de08b7c0a34d68f71d0c89dde61a5c7c339d is a blob, not a commit
error: no such commit 2055de08b7c0a34d68f71d0c89dde61a5c7c339d
FedeDP commented 1 year ago

So, it seems jsons are still referenced somewhere (of course):

So, it seems like these might be related to the number of json pushing PRs that were opened to main: exactly 13 with 2 jsons, so 26 entries (exactly same number seen above).

I think that we correctly cleaned up main, but we couldn't clean up GitHub hidden PRs refs (and we can't force push them either afaik), so their blobs are still there.

Using https://rtyley.github.io/bfg-repo-cleaner/ works, ie: git rev-list does not show any json file anymore, but then we cannot push (neither force push) because it gave us:

git push -f
Enumerazione degli oggetti in corso: 387, fatto.
Conteggio degli oggetti in corso: 100% (188/188), fatto.
Scrittura degli oggetti in corso: 100% (387/387), 35.72 KiB | 35.72 MiB/s, fatto.
387 oggetti totali (188 delta), 188 riutilizzati (188 delta), 199 riutilizzati nel file pack
remote: Resolving deltas: 100% (322/322), completed with 167 local objects.
To github.com:falcosecurity/kernel-crawler.git
 ! [remote rejected] refs/pull/1/head -> refs/pull/1/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/10/head -> refs/pull/10/head (deny updating a hidden ref)
...

for each opened PR.

maxgio92 commented 1 year ago

JFI I think we can do nothing about synthetic references as they're read-only: https://github.com/rtyley/bfg-repo-cleaner/issues/36#issuecomment-37877829

poiana commented 10 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

FedeDP commented 10 months ago

/remove-lifecycle stale

poiana commented 7 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

FedeDP commented 7 months ago

/remove-lifecycle stale

poiana commented 4 months ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

FedeDP commented 3 months ago

/remove-lifecycle stale

poiana commented 4 weeks ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

FedeDP commented 4 weeks ago

/remove-lifecycle stale