container_push with (basic) auth

pcj commented 5 years ago

The container_push rule works if the pusher.par can find an adequate $HOME/.docker/config.json for authentication (DOCKER_CONFIG works too). It would be nice to be able to provide the docker config.json file as an input to the container_push rule. Any plans for this? I can work on a PR if people are amenable to that.

nlopezgi commented 5 years ago

Hi @pcj I think it would be great if we could provide a feature to override this default behavior. Do you have any thoughts on how to implement this properly (before going to a PR), my concern mainly is where this info is defined (once its defined somewhere passing it to pusher.par seems trivial). My initial thought is the proper way to do this is:

Via a repository rule (https://docs.bazel.build/versions/master/skylark/repository_rules.html - e.g., https://github.com/bazelbuild/rules_docker/blob/master/container/pull.bzl) which produces as output a new "docker_toolchain" rule target (following advice in https://docs.bazel.build/versions/master/toolchains.html) which should have (among other attrs) the location of the auth file.

There are some other ways to do this that are not acceptable, imo, particularly:

Via an attr in container_push that takes in a single file. The auth file has to be in the source tree. The main issue with this approach is the file has to be checked in, which will not work for an auth file.
Add a string attr to container_push that should take an absolute path to the auth file. This is unacceptable because it breaks Bazel hermeticity (i.e., changing the contents of the auth file might result in undefined / undesired behavior).

Doing 1 will require significant effort. If you're willing to put in some time to get this working I can help you get started. We do have plans to do work so that these rules can start using toolchain rules (lots more than just for auth for container_push) properly in the upcoming months, so it might also be an option to delay working on this issue until we have at least created the first parts of the toolchain rule which would allow you to just work on hooking that up to also work for the auth bit.

If you can think of other options to implement this let me know.

pcj commented 5 years ago

I don't like (3) either. I'd have to do more research about (1).

I was imagining (2) [which you blacklisted :smile:]. Here's how we do the auth file:

There is a repository rule that accepts "secrets" by reading environment variables and writes out a secrets.bzl file in a external workspace that contains scalar values like DOCKER_USERNAME and DOCKER_PASSWORD that are loadable. This way, we can avoid committing passwords and so forth to git. For example, we have code like load("@env//:secrets.bzl", "MAVEN_USER") where the file secrets.bzl contains the definition MAVEN_USER = "foo". It avoids hardcoding values in source files, though the values are still plaintext in a developers machine. I consider this more-or-less equivalent to the security provided by "kubernetes secrets", for example. (It would be pretty easy to base64encode these and then have a function to base64decode them when needed, but we haven't done that yet).
Based on those values, we generate the .docker/config.json file.

In this scenario, it would not be much more work to add an option to the pusher.par that accepts the location of the auth config file.

So it seems that the fundamental problem is about "secret values" and how to prevent the user from committing them in the repo. We have our own partial solution as detailed above. In the ideal case, bazel itself would have some built-in support for secrets. If this is provided by the toolchain architecture that's good with me, but as I write this I'm not clear on how using toolchains solves this.

nlopezgi commented 5 years ago

Thanks for the detailed clarifications, if there is a path to generate the secret values via a repository rule which enables this to work without needing to commit this info to your repo then I think we could do (2) (my note about using toolchain rules is that this is the "recommended" way to do all these things now, but a repository rule that produces a file with the info, in this case, is the next best thing). As you say, I do think this should be a much simpler change than using toolchains. The one challenge I see is how to document this feature so that users know the expectation is for these files to be generated and not checked in, and to provide pointers so that uses can do this right. Could you point me to the rules you are using to create the config.json file (i.e., is it open source)? I think use of this feature might just require providing in this repo some solution for generating the config file (not sure if that means upstreaming the rule you currently have or providing a 'simpler' version of it here, thoughts?)

pcj commented 5 years ago

Not open source... til now! Well, not rocket science, but here it is. It could be adopted to a docker_config repository rule that specifically writes a docker config.json file.

BUILD_BZL_CONTENTS='''
filegroup(
    name="secrets",
    srcs=["secrets.bzl"],
    visibility=["//visibility:public"]
)
'''

UNSET_VALUE = "______UNSET______"
REQUIRED_VALUE = "<REQUIRED>"

def _environment_secrets_impl(repository_ctx):
    entries = repository_ctx.attr.entries
    env = repository_ctx.os.environ

    lines = ["# Generated - do not modify"]
    missing = []

    for key, defaultValue in entries.items():
        value = env.get(key, UNSET_VALUE)
        if value == UNSET_VALUE and defaultValue == REQUIRED_VALUE:
            missing.append(key)
        elif value == UNSET_VALUE and defaultValue:
            value = defaultValue

        # Escape quotes and backslashes
        value = value.replace("\\","\\\\")
        value = value.replace("\"","\\\"")

        line = "{0}=\"{1}\"".format(key, value)
        lines.append(line)

    if len(missing) > 0 : 
        fail("Required Secret environment variables were empty: "+ (",".join(missing)) ) 

    secrets_file = "\n".join(lines)

    repository_ctx.file("secrets.bzl", secrets_file)
    repository_ctx.file("BUILD.bazel", BUILD_BZL_CONTENTS)

"""

Explicitly import secrets from the environment into the workspace. The 'entries'
is a string -> string key/value mapping such that the key is the name of the
environment variable to import.  If the value is the special token '<REQUIRED>'
the build will fail if the variable is unset or empty.  Otherwise the value will
be used as the default.

    environment_secrets(
        name="env", 
        entries = {
            "MAVEN_REPO_USER": "<REQUIRED>",
            "MAVEN_REPO_PASSWORD": "<REQUIRED>",
            "DOCKER_PASSWORD": "<REQUIRED>",
            "DOCKER_URL": "index.docker.io",
        },
    )

In the example above, DOCKER_URL will use the value 'index.docker.io' if the
"DOCKER_URL" environment variable is not set.

Then in build scripts you can reference these by importing a custom bzl file.

    # Import a secret into the local BUILD.bazel environment
    load("@env//:secrets.bzl","MAVEN_REPO_USER")

    # Use the value
    sample_rule(arg=MAVEN_REPO_USER)

"""
def environment_secrets(name, entries): 
    the_new_rule = repository_rule(
        implementation = _environment_secrets_impl,
        attrs = {
            "entries": attr.string_dict(
                default = entries,
            ),
        },
        environ = entries.keys(),
    )
    the_new_rule(name = name)

Whatever the implementation, it might be a good idea to define the container_push / container_pull attr as "docker_config": attr.label(mandatory = False, providers = [SecretFileInfo]) such that the intent is communicated not to use just any single file label (and not to encourage people to check these files in).

nlopezgi commented 5 years ago

Thanks for sharing the code, looks quite useful! I'd be very happy to review a PR that adds your rule (either as something generic as you have now, or specific for the docker use case) in skylib folder and to add the new attr that specifies label must provide a SecretFileInfo (never seen that provider before, but it looks like exactly what I wanted to make sure users dont misuse this). I'm slightly concerned that testing this will be non trivial. Specifically:

testing at least that container_pull works in e2e tests in /testing should be feasible, but might require a bit of work, it would be great if you could try to modify https://github.com/bazelbuild/rules_docker/blob/master/testing/e2e.sh to add a new use case with the new docker_config property it would be great
testing container_push is not viable, I dont think we have any tests for it as of yet
testing your repository rule seems hard-to-impossible (in general, testing directly repository rules is something I dont think can be done), but adding an example somewhere in this repo would be ideal Let me know if you have any ideas as to how to test this and if you are willing to send a PR to do all this (add repo rule, add attr, add tests).

pcj commented 5 years ago

OK will do. May require a peer PR in google/containerregistry, not sure yet.

pcj commented 5 years ago

@nlopezgi Would you mind looking over the prototype implementation? There is a PR here and at google/containerregistry as linked above.

I think this an improved solution that uses the --action_env feature. The basic idea goes like this:

load("@io_bazel_rules_docker//container:container.bzl", "registry_credential")

registry_credential(
    name = "private",
    user_env = "PRIVATE_REGISTRY_USERNAME",
    pass_env = "PRIVATE_REGISTRY_PASSWORD",
)

container_push(
    name = "push",
    registry = "private.container-registry.io",
    repository = "path/to/my/repo",
    tag = "latest",
    credential = ":private",
)

Finally, add the following to your .bazelrc file:

build --action_env=PRIVATE_REGISTRY_USERNAME
build --action_env=PRIVATE_REGISTRY_PASSWORD

In this scenario, a private/.docker/config.json file will be written and passed to the pusher tool as pusher.par --client-config private/.docker/config.json ... at runtime.

WDYT? Looking for feedback before working on testing strategy and adding the change to puller.

pcj commented 5 years ago

Adding @solarhess who open-sourced https://github.com/solarhess/rules_build_secrets

excavador commented 5 years ago

You should consider to use HTTP_PROXY environment variable Python requests respects this variable

nlopezgi commented 5 years ago

hi @pcj . I've been giving this some more thought. I think the setting of the credential can be done in a sensible way both for pushing and for pulling via proper use of toolchain rules. If we had a toolchain rule for docker:

it would allow users to select via ENV var the location of their .docker/config.json
it would autodetect (fallback) to location in $HOME if none is specified
it could also take in as parameter a file generated by another workspace repo rule

For your use case the rule that generates the secrets could then pass the file to toolchain rule. All pull / push rules that are executed would use the same credential. To change the credential the user can either change ENV vars for location of their config or change the value of the file being passed.

Would a solution like this work for you?

nlopezgi commented 5 years ago

this will be fixed with https://github.com/bazelbuild/rules_docker/pull/594

pcj commented 5 years ago

SGTM, I'll close this one.

bazelbuild / rules_docker

container_push with (basic) auth #526