EmbarkStudios / k8s-buildkite-plugin

Run any buildkite build step as a Kubernetes Job
https://embark.dev
Apache License 2.0
49 stars 19 forks source link

No way to disable git mirror #55

Open artem-zinnatullin opened 1 year ago

artem-zinnatullin commented 1 year ago

Hi team, long time no see!

We've been exploring Partial Git Clones recently and found that Treeless git clones are super fast for us while providing a fully working Git repo.

Now I want to check if we actually gaining anything by using shared Git Mirror dir with Treeless clone or wasting more time on it due to what BuildKite agent does while using git mirror:

# Using git-mirrors experiment 🧪
--
  | $ cd /git-mirrors
  | # Updating existing repository mirror to find commit 25009bc1f51560bb81763bd7e24e309963647476
  | $ git --git-dir /git-mirrors/git-github-myrepo-git remote set-url origin git@github.com:myrepo.git
  | $ git --git-dir /git-mirrors/git-github-myrepo-git fetch origin master

From what I see in the code after trying to remove git-mirrors-host-path from our custom params is that k8s-buildkite-plugin then defaults to /git-mirrors and continues to use Git Mirror.

Thus, I propose a change when a null value on git-mirrors-host-path will disable git mirror feature, while setting a value will enable it with passed path.

The change however will be breaking since default behavior will change and some users might have relied on it, for them we will suggest explicitly passing the old default path /git-mirrors.

I can implement the change if we come to agreement!

wdyt?

Thanks!

tgolsson commented 1 year ago

Hello @artem-zinnatullin! Thank you for the report.

As far as I can tell setting the host path to "" (the default) should end up with the actual job running with an emptyDir mirrors path, which shouldn't be shared:

https://github.com/EmbarkStudios/k8s-buildkite-plugin/blob/0e13cac380ab4a5eb0736bf1b67866f2588146c9/lib/job.jsonnet#L140-L143

I've reviewed a few of my pipelines (all running with the default "" value) and they're are doing full clones in the job. Is this related to some bad interaction by just having the plugin? I'm not conceptually against disabling git-mirrors if not configured - this would mirror the de-facto behavior as I can tell.

Finally, just to validate - is the above log from the the "job" phase or the "setup" phase on the static Buildkite node? I'm a bit suspicious of it updating an existing repository if you haven't configured git-mirrors, from the above reasoning.

artem-zinnatullin commented 1 year ago

@tgolsson

As far as I can tell setting the host path to "" (the default) should end up with the actual job running with an emptyDir mirrors path, which shouldn't be shared:

Yes, it does run with emptyDir and still does full git clone in there! Which is what I'm trying to disable because we're using treeless clones now and simply don't need git mirrors at all and they only slow us down :)

That's why I'm asking for a way to completely disable git mirror functionality, it's a feature of BuildKite Agent that is not enabled by default and I think it's reasonable for k8s integration to also not have it on by default.

Finally, just to validate - is the above log from the the "job" phase or the "setup" phase on the static Buildkite node? I'm a bit suspicious of it updating an existing repository if you haven't configured git-mirrors, from the above reasoning.

This is from the "job phase", git clones in "setup phase" is a separate issue I'm dealing with yeah! (Our repos grow bigger and bigger, cloning became an issue)

For context: we're one of the oldest users of this repo (2+ years), at some point I rewrote huge portion of this repo :)

tgolsson commented 1 year ago

Yes, it does run with emptyDir and still does full git clone in there! Which is what I'm trying to disable because we're using treeless clones now and simply don't need git mirrors at all and they only slow us down :)

Cool, makes sense. Out of curiosity, how do you configure treeless clones? Does it all happen in "userland"? Or do you configure buildkite to do this? Wondering whether we also need to make sure the job/init-container (if you use them?) has the proper config.

This is from the "job phase", git clones in "setup phase" is a separate issue I'm dealing with yeah! (Our repos grow bigger and bigger, cloning became an issue)

Hmm! I'm not quite following why it'd do a repeat/update then since the dir would be empty, but I assume it ends up cloning twice (maybe the treeless comes first?). Is there a simple repro/recipe I could use for this? I think it'd be a good case to add a self-test for too for the future.

artem-zinnatullin commented 1 year ago

Out of curiosity, how do you configure treeless clones? Does it all happen in "userland"?

We pass BuildKite Agent env var in "userland" BUILDKITE_GIT_CLONE_FLAGS=--filter=tree:0

Wondering whether we also need to make sure the job/init-container (if you use them?) has the proper config.

Git mirroring happens in init container, so yeah, that's what I want to change (I'm willing to work on PR if we agree on design!)

Hmm! I'm not quite following why it'd do a repeat/update then since the dir would be empty, but I assume it ends up cloning twice (maybe the treeless comes first?). Is there a simple repro/recipe I could use for this? I think it'd be a good case to add a self-test for too for the future.

"Second" clone happens because EmbarkStudios/k8s-buildkite-plugin always passes --experiment=git-mirrors to buildkite-agent https://github.com/EmbarkStudios/k8s-buildkite-plugin/blob/9a156cab691dd30eac269cb842bc46bd4dfa1145/lib/job.jsonnet#L265 and this is what I want to allow users to disable

tgolsson commented 1 year ago

OK; so I think this makes sense as a change and doesn't seem super-complex. I'm rewriting a bunch of the CI to do self-tests with the current commit so I'll see if I can replicate the negative case; and then we can think about a fix.