buchgr / bazel-remote

A remote cache for Bazel
https://bazel.build
Apache License 2.0
595 stars 154 forks source link

Fix kubernetes example #654

Closed jhchabran closed 1 year ago

jhchabran commented 1 year ago

I got caught an issue where the provided k8s example wasn't behaving as advertised, which was quite hard to spot, until I double checked the output of /status and noticed the max cache size was not matching what I had configured.

This lead to cache eviction breaking builds, because ours are configure to leverage the Remote Builds without Bytes which expects a given cache element to not be evicted during a build.

I'm including an example error message below for others to find this issue easily if they're facing the same problem:

Testing (...) failed: Failed to fetch blobs because they do not exist remotely. Build without the Bytes does not work if your remote cache evicts blobs during builds: Missing digest: (...)

While this PR fixes the immediate problem with the example, I think that for a longer term solution, the container images should avoid to define a default --max_size argument and we should rely instead on sane defaults defined within the code. But doing so is outside the scope of this PR and that might change the behaviour of current installations using those defaults.

All of this is happening because urfave/cli/v2 expects env vars to provide a default value if the flag is not given.


The bazel-remote binary flags falls back to env vars for default values, meaning that $BAZEL_REMOTE_MAX_SIZE will be ignored if the binary was invoked with --max_size=5.

The relased containers contains a default --max_size flag which the README addresses by suggesting to pass your own --max_size flag when running bazel-remote containerized.

But when that container as a k8s deployment, it's very usual to simply configure everything through env vars. This creates a very hard to spot scenario, where one might think that the max size is correctly configured with the env var, but in reality the flag will take precedence and run bazel-remote with the default 5 GiB size, regardless of what the user configured.

This commit fixes this by correctly providing an args: [...] to behave exactly as the user would expect it.

jhchabran commented 1 year ago

Closing in favour of #660