Use environment variables in .bazelrc via string interpolation

nathanhleung commented 4 years ago

Description of the problem / feature request:

I want to use environment variables in my .bazelrc via string interpolation

Feature requests: what underlying problem are you trying to solve with this feature?

I'm running a remote Bazel cache secured with Basic Auth. I don't want to hardcode basic auth credentials into the .bazelrc checked into the repository.

Example .bazelrc:

build --remote_cache=https://username:password@mybazelcache.example.com

I would rather do something like this:

build --remote_cache=https://${CACHE_USERNAME}:${CACHE_PASSWORD}@mybazelcache.example.com

What operating system are you running Bazel on?

macOS Catalina 10.15.3

What's the output of `bazel info release`?

INFO: Invocation ID: 067b442a-efd2-431e-b39d-9ea6e2467bf3 release 2.0.0

Have you found anything relevant by searching the web?

shimmerjs commented 4 years ago

The official docs provide a workaround for this use case via try-import in your .bazelrc: https://docs.bazel.build/versions/master/best-practices.html#bazelrc

psigen commented 4 years ago

This workaround isn't exactly covering the use-case (I have a similar need).

Using a try-import with a user.bazelrc requires that the user rcfile to contain the entirety of the option string. So in the situation where there is a small piece of user-specific information (i.e. username) that is combined with a big piece of information that should be source controlled (i.e. full url of remote cache), we are currently lacking a mechanism to combine those bits together.

(Unless I'm missing some mechanism to combine data from different lines of the rcfiles.)

nathanhleung commented 4 years ago

@psigen Here was my workaround:

I needed this feature specifically for a CircleCI setup (where I set environment variables in "Project Settings"), so what worked for me was creating a script generate_bazelrc.sh:

#!/bin/bash

# Exit on error
set -e

echo "# Generated by generate_bazelrc.sh" > .bazelrc
echo "build --remote_cache=https://$BAZEL_CACHE_USER:$BAZEL_CACHE_PASSWORD@mybazelcache.example.com" >> .bazelrc

and then running ./generate_bazelrc.sh before the start of every build.

psigen commented 4 years ago

Thanks. My workaround is also running a script. But I'm using this for local development, so it's not as easy to make sure it's sourced when the environment is set up.

jheaff1 commented 3 years ago

I too would be interested in this. Specifically, I would like to set --distdir=${SOME_VARIABLE} in my .bazelrc

burkpojken commented 3 years ago

Is there any technical reason for not allowing env variables in the bazelrc files?

chancila commented 3 years ago

+1 for this change, it would be cleaner than having to use some wrapper to basically to do the same thing via try-import

aiuto commented 3 years ago

The reason for not allowing it is that it is going to add complexity for little real gain.

anthpi commented 3 years ago

@aiuto For us this feature is rather important. Especially in our complex environment when we have several workspaces and several repositories. This feature will also simplify environment for many users as they no longer have to maintain a wrapper around bazel in order to achieve same functionality. Complexity of the proposed implementation we can discuss in the commit review. Implementation in https://github.com/bazelbuild/bazel/pull/13266 is backwards compatible so it should be no negative impact on existing users.

lberki commented 3 years ago

"Though a program be but three lines long, someday it will have to be maintained."

I agree with @aiuto here; this looks like something that's easy to achieve with a wrapper script and thus provides little gain for the additional implementation complexity.

Looking at #13266 , a number of issues come to mind:

it's unclear how tokenization works (what if the environment variable contains spaces, newlines, backspaces, hashmarks or null characters? What if an environment variable resolves to ${VAR}? Is that looked up again?)
There doesn't seem to be a mechanism to quote the ${} when that exact syntax is needed in arguments
This might constitute a leak of environment variables (what if someone writes test --test_arg=${SSH_AUTH_KEY} in a .bazelrc?)

All of these, are of course, fixable, but I see no reason to take on the job of maintaining this functionality.

psigen commented 3 years ago

Perhaps I can walk through some of the issues I have had with using a wrapper script in these cases, to demonstrate a bit about why this is not something that's particularly easy to emulate with a wrapper script.

There are a few primary use cases that I have for this functionality (others may have additional):

Many cloud-based CI systems pass settings/tokens via environment variables by default.
Users have unique per-user credentials or configuration paths that are available through concatenating variables such as $USER and $XDG_DATA_HOME.
Automated credentialing systems like zero-trust login systems (e.g. Hashicorp Vault) create temporary and frequently rotated authorizations agents whose paths are transient and exported via environment variables.
When running on dynamic infrastructure (k8s, AWS), the environment often contains host or domain name information that is not otherwise accessible, as well as login information such as X11 or SSH agent paths that change dynamically.

The "wrapper" approaches we can use to address these fall into two basic categories:

Create a wrapper to generate a .bazelrc file

This is the approach detailed in https://github.com/bazelbuild/bazel/issues/10904#issuecomment-655026832 . We can run a pre-build step to manually generate some files in the bazel workspace root.

This requires execution of a separate tool script that needs to be distributed independently and composed/inherited independently. Right now, I have 5 separate workspaces (due to IP cross-pollination restrictions at work) that share content with a "base" monorepo. In order to have shared content in a standalone wrapper script that needs to be executed ahead of bazel (to generate a .bazelrc file to put bazel in a runnable state), I need to put a script in my monorepo, then go put top-level scripts into all the workspaces that include resolve the monorepo source. I would still have to do this with .bazelrc, but I already was doing it via try-import, so now I have do to it twice and keep the two in-sync.
User documentation needs to be adapted to remember to run this script ahead of other build operations. Additional build steps are additional steps to forget. I tried to make this as obvious as possible when it happened, but with all the various calls to bazel and ibazel that various tooling can make (e.g. through skaffold or tilt), it's a lot of surface area to cover.
This doesn't help address dynamically-rotated information (1,3,4 above), because there is no way to know when to re-run the script because a cert has been rotated or a user has logged into from another SSH session. This ended up being the deal-breaker to this approach, since I have users relying on dynamically-generated SSL certs for client authentication.

Create a wrapper around the bazel executable itself

Another alternative to this is to create a wrapper that replaces the bazel CLI tool entirely, and performs additional environment setup. Basilisk offers a special functionality where putting a tools/bazel file into the workspace root works as a special override that can used for wrapper scripts like this.

Similar inheritance problems to the above. I have 6 tools/bazel files, and 5 of them contain cut-and-paste boilerplate to search around for the 6th across various potential local and remote locations (that base repo is a local_repository or git_repository included by the others) and source it.
It requires installation of basilisk, manual calling of the wrapper script, or implementation of a similar wrapper script mechanism that is installed into $PATH. Basilisk is a great tool, but it's not really ideal for CI or other places where the version of bazel needs be controlled (like when running a build matrix against version of bazel). This means that my CI/CD needs to directly call tools/bazel, which is a bit annoying when dealing with third-party tooling (e.g. using skaffold).

I've spent about 20 hours so far wrapping, documenting, and debugging the various levels of wrapper described above, and in addition to the folks above in the thread there have been a few recent posts on the bazel Slack related to working around the same issues.

I would be much happier directing that effort towards helping get this feature instead. I do agree that it needs a bit more formalization than the linked PR.

anthpi commented 3 years ago

@lberki Thank you for your feedback. There are several aspects of having a wrapper that are problematic:

From the user perspective if we modify the bazelrc file an error occurs it might be hard for the user to understand what caused it as file name printed by Bazel will point to some generated file.
With wrapper being triggered prior Bazel one can end up in a situation when several builds are being triggered (almost at the same time) and this could impact content of the generated .bazelrc file and cause some stange and hard to debug errors.
Having a wrapper doesn't solve a situation when you have "import" statements in bazelrc file pointing to other bazelrc files that also might contain environment variables that also need to be resolved. This will make a wrapper much more complex and one would need to implement all the logic in the wrapper. Then (1) will become even more problematic.

Could you also elaborate more on your point about "leak of environment variables"? Currently it's completely fine to write "bazel test --test_arg=${SSH_AUTH_KEY}" on the command line. What is the difference between this and moving this into bazelrc as you stated? I don't think that I understand what your mean.

anthpi commented 3 years ago

@lberki Ping! Any thoughts on the discussion/comments above?

ron-stripe commented 3 years ago

For me the big one is that several of our scripts call bazel several times and I want to pass --profile=/some/dir/XXX and get different files written to same dir. If I could get something to distinguish different runs without adding to all the different scripts that call bazel, it would help.

anthpi commented 3 years ago

@lberki @aiuto Could you provide some feedback on the discussion?

aiuto commented 3 years ago

I'm still not convinced that a wrapper to add to .bazelrc is an unreasonable workaround, especially in the context of a CI system. You always need a wrapper in the CI, if only to be a 3 line shell script that does bazel test //.... That is a reasonable place to write out a very precise file that you can try-import.

Whoaa512 commented 3 years ago

@aiuto That's not necessarily true especially in light of @psigen's comment above outlining this disparate workspace setup.

The crux of this issue is an argument of where the complexity of handling shell variables should live.

Your argument is that the complexity for end-users to create a wrapper around bazel is trivial and therefore makes sense that this should not live in bazel core because the complexity it adds to the logic for parsing .bazelrc files which must be maintained and maintenance is a non-zero cost.

However I think there are several cases here which point to the contrary that creating & maintaining a bazel wrapper is more complex than it looks on the surface. Solving this problem in a scoped manner within bazel core is ideal because it's a clear win for the open-closed principle & helps minimize complexity across the bazel ecosystem at the cost of a relatively small maintenance burden (which can be reduced with sufficient test coverage).

An alternative here might be to allow bazel to support some sort of preprocessor script for the .bazelrc before the actual command is run, though this maybe be even more complex than the currently proposed solution.

lberki commented 3 years ago

The reason why this feels strange is that you want to put a shell variable parser in Bazel even though there is a perfectly good system made for parsing shell variables around Bazel: the shell itself.

I think re-writing .bazelrc files is not a great solution, but I still have trouble understanding why having a wrapper script is a no-go. In particular, Installation of Bazelisk is not required, although it admittedly makes things much simpler.

@psigen seems to agree that Bazelisk is a viable alternative for interactive builds. For CI, I don't think it's necessary: you presumably have to specify a binary to run on CI (as opposed to just bazel), so it looks like running a shell script instead of Bazel isn't a big change?

psigen commented 3 years ago

Hi all, I just wanted to highlight this from my previous post:

I've spent about 20 hours so far wrapping, documenting, and debugging the various levels of wrapper described above, and in addition to the folks above in the thread there have been a few recent posts on the bazel Slack related to working around the same issues.

The crux of the issue is the tremendous amount of re-writing involved because using a shell-script means that the existing mechanisms of composition (i.e. try-import) and executable resolution (which bazel) have to effectively be re-written for each individual project to general dynamic CLI args via templating in what would otherwise be an extremely simple syntax.

I don't want to do this. :sob: I don't like each team I am working with (at different companies) have their own almost-the-same-but-not-the-same mechanism for getting environment secrets into their bazel builds. I don't like debugging the same typos and edge cases in each environment. I don't like having to write documentation for how to use the script wrapper.

I think the usage of environment variable interpolation for passing data is a relatively common use case, as I detailed in my previous post. If we look at similar systems (even relatively "hermetic" ones) we can see this particular escape hatch showing up for the same reasons:

docker-compose
skaffold
CMake
make
As a counterpoint, meson does not do this, but it has a deterministic entrypoint into its execution, so it's possible to create injection scripts within the build system.

uri-canva commented 3 years ago

I prefer for bazel to continue supporting / developing generally applicable hooks like tools/bazel rather than narrower solutions like environment substitutions in bazel rc files.

I know we could have both, but given maintenance capacity on the project is finite, I can understand prioritising features and code that have higher power / effort ratio.

For what it's worth I find myself using tools/bazel for more than just configuring bazelrc files, so it's not a high cost for me to add some environment variable handling logic in there, especially when it can handle defaults when environment variables aren't set and setting environment variable to the output of other programs and scripts, both things that I use very often and that the proposed solution in this thread does not address.

psigen commented 2 years ago

If the limiting factor is contributor bandwidth, does this mean that folks would be generally amenable to a feature PR adding the functionality?

uri-canva commented 2 years ago

I can't speak for the core bazel team, but note that in general the initial PR adding a feature is just a small part of the capacity the addition of a feature requires to be properly implemented and maintained: ongoing maintenance of it and the added complexity on the parts of the code that interact with it are a bigger part of it.

aiuto commented 2 years ago

Exactly. Anyone can write a PR to add a feature and then go away. The Bazel team is then on hook to maintain it forever.

abhandaru commented 2 years ago

Just as another data point: this feature would simplify quite few things for my team. I do understand and respect the perspective of the maintainers, however. Thanks all.

aiuto commented 2 years ago

I just want to give an update on this. One of our engineers has recently started evaluating all the issues related to .bazelrc and flag parsing to come up with a unified plan to address them. That will include:

defining important use cases to set overall requirements
precisely documenting existing behavior
mapping requirements to existing behaviors and addressing the gaps where critical cases can not be achieved.
changing existing behavior and/or providing ready-to-copy examples to achieve the requirements

Our criteria for deciding overall behavior will include:

can users accomplish the critical things
simpler to explain and document usually wins over feature richness.
no new flags to control startup flag parsing (remove existing ones when possible)
OS specific code or behavior is strongly discouraged

I expect this effort will make some people happy that their favorite issue gets resolved while disappointing others, because it appears that there are issues filed asking for conflicting things. That's why a requirements doc tied to use case is an important part of this effort. It allows us to better evaluate current and future issues to determine priority and feasibility. Most of this to be done in Q1 2022, so stay tuned for updates early next year.

loeffel-io commented 2 years ago

any update?

loeffel-io commented 2 years ago

also build --define=HELLO=${WORLD} would help so much

psigen commented 2 years ago

That's why a requirements doc tied to use case is an important part of this effort.

@aiuto was this document created or shared with the community at any point?

aiuto commented 2 years ago

Whoops.... Yes. We built an extensive use case and requirements doc. But AFAIK, we forgot to share it out.

@aranguyen If you have not done so, can you copy "Bazel Flag and rc File Possible Improvements" to a doc in bazel.build and share it with comment access to the world?

aranguyen commented 2 years ago

Hello all, sorry for the delay, I was OOO for some time. Here is the link to the document https://docs.google.com/document/d/1l1qllOBSjj295pui_krJe-LQs2AICZmhZSROuruLsFU/edit?usp=sharing (Section Requested User Use Cases). Please feel free to add additional comments there to support the use case.

psigen commented 2 years ago

Thanks @aranguyen : I read through the entire document, but I do not have permissions to make comments on the doc. Should I request access, or is there a different way to provide feedback?

aranguyen commented 2 years ago

@psigen please check again if you could add comments on the doc directly now. Thanks!

ptabor commented 1 year ago

Use-case I'm facing:

I cannot add: --sandbox_writable_path=~/.npm nor --sandbox_writable_path=$HOME/.npm to .bazelrc as this triggers:

Caused by: java.lang.IllegalArgumentException: Paths must be absolute: '~/.npm'
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:220)
        at com.google.devtools.build.lib.vfs.Path.<init>(Path.java:78)
        at com.google.devtools.build.lib.vfs.Path.create(Path.java:73)
        at com.google.devtools.build.lib.vfs.Path.create(Path.java:68)
        at com.google.devtools.build.lib.vfs.FileSystem.getPath(FileSystem.java:72)
        at com.google.devtools.build.lib.sandbox.AbstractSandboxSpawnRunner.getWritableDirs(AbstractSandboxSpawnRunner.java:358)

This forces me to let each user to deal with the flag separately.

wyhao31 commented 1 year ago

Same issue as @ptabor mentioned.

scott-osk commented 1 year ago

+1 I am facing the same issue as @wyhao31 and @ptabor

noel-yap commented 2 months ago

Use case I have:

build --disk_cache=${GIT_BRANCH_SPECIFIC_DISK_CACHE}

IOW, I don't want to have a bunch of cache misses each time I switch git branches

FWIW, I'm already using direnv to manage my env and can use that to set GIT_BRANCH_SPECIFIC_DISK_CACHE each time I switch branches. And, of course, I would also need to pre-populate the cache for new branches and take care of cleanup

ed-irl commented 2 months ago

Another use case -

Testcontainers uses a file called ~/.testcontainers.properties for service advertisement. It would be helpful to have a way to make this available as a test_env argument in the .bazelrc

bazelbuild / bazel