adoptium / containers

Repo containing the dockerfiles and scripts to produce the official eclipse-temurin containers.
https://hub.docker.com/_/eclipse-temurin/
Apache License 2.0
206 stars 91 forks source link

Add support for custom CA certificates #392

Closed rassie closed 11 months ago

rassie commented 1 year ago

This adds the capability to add custom CA certificates for Java truststore.

This capability used to exist in the old openjdk images, but has been removed from eclipse-temurin images. This PR adds an entrypoint to eclipse-temurin images, which adds CA certificates from /certificates path inside the image to the system certificate store and replaces JRE's truststore with it. This only happens if USE_SYSTEM_CA_CERTS environment variable is set, so by default, no action is taken.

Important caveat (needs further discussion!)

This PR deviates from the discussion in #293 in one important aspect. When discussing this feature, I've made an assumption about the inner workings of openjdk images which turned out incorrect. I thought that the system certificate store and JRE's truststore used to get merged by the openjdk image, which was option c) in our discussion. However, it turns out that the system store used to get converted to truststore format and then replaced JRE's store completely. This is the option b) in the #293 discussion, which we dismissed as non-sensical.

While implementing this PR, I've took previously existing process in openjdk images as my blueprint, which means, I've actually implemented b) instead of c) and would like to argue in favor of keeping this implementation. I'm fully aware that this decision will need to be discussed further.

My arguments in favor are:

Differences from openjdk images

Even though this PR is intended to re-introduce functionality previously available in openjdk images, there are important differences in the implementation and usage:

Basically, with openjdk images it has been necessary to patch the entrypoint to include update-ca-certificates, eclipse-temurin images would only require an opt-in environment variable to be set.

Documentation

The documentation is missing completely, because I don't know (yet) whether to put it.

Short itemized quick-start guide:

Testing

This patch has been tested semi-manually with the following Bash script executed from the repository root

#!/bin/bash

# shellcheck source=common_functions.sh
source ./common_functions.sh

mkdir -p certs

openssl req -nodes -new -x509 -subj "/DC=Termurin/CN=DockerBuilder" -keyout certs/server.key -out certs/server.crt >&/dev/null

check() {
    if [ "$1" = "$2" ]; then
        # output green unicode checkmark
        echo -n -e "\xE2\x9C\x94"
    else
        # output red unicode cross
        echo -n -e "\xE2\x9C\x98"
    fi
}

for ver in ${supported_versions}; do

    find ${ver} -iname 'Dockerfile.*' -printf "$ver/%P\n" | grep -v "windows" | while read -r dockerfile; do
        NAME=$(dirname $dockerfile)
        docker build -t "temurin/$NAME" -f $dockerfile "$NAME" >&/dev/null

        CMD1=date
        if [ "$ver" = "8" ]; then
            CACERTS=/opt/java/openjdk/lib/security/cacerts
            CACERTS2=/opt/java/openjdk/jre/lib/security/cacerts

            CMD2=(sh -c "keytool -list -keystore $CACERTS -storepass changeit -alias dockerbuilder || keytool -list -keystore $CACERTS2 -storepass changeit -alias dockerbuilder")
        else
            CMD2=(keytool -list -cacerts -storepass changeit -alias dockerbuilder)
        fi

        echo -n "$NAME: "

        docker run "temurin/$NAME" $CMD1 >/dev/null
        check $? 0
        docker run "temurin/$NAME" "${CMD2[@]}" >&/dev/null
        check $? 1

        docker run -v $(pwd)/certs:/certificates/ "temurin/$NAME" $CMD1 >/dev/null
        check $? 0
        docker run -v $(pwd)/certs:/certificates/ "temurin/$NAME" "${CMD2[@]}" >&/dev/null
        check $? 1

        docker run -v $(pwd)/certs:/certificates/ -e USE_SYSTEM_CA_CERTS=1 "temurin/$NAME" $CMD1 >&/dev/null
        check $? 0
        docker run -v $(pwd)/certs:/certificates/ -e USE_SYSTEM_CA_CERTS=1 "temurin/$NAME" "${CMD2[@]}" >&/dev/null
        check $? 0
        echo
    done
done

This scripts generates a certificate, builds all images and tests that the entrypoint does not fail:

The script checks that both normal execution of any command (in this case date) is possible and that the certificate is in the JRE's truststore when it's expected to be there (when the certificate is mounted and opt-in is set).

The following output is produced by the test script, reporting a full success:

8/jre/ubi/ubi9-minimal: ✔✔✔✔✔✔
8/jre/centos: ✔✔✔✔✔✔
8/jre/alpine: ✔✔✔✔✔✔
8/jre/ubuntu/focal: ✔✔✔✔✔✔
8/jre/ubuntu/jammy: ✔✔✔✔✔✔
8/jdk/ubi/ubi9-minimal: ✔✔✔✔✔✔
8/jdk/centos: ✔✔✔✔✔✔
8/jdk/alpine: ✔✔✔✔✔✔
8/jdk/ubuntu/focal: ✔✔✔✔✔✔
8/jdk/ubuntu/jammy: ✔✔✔✔✔✔
11/jre/ubi/ubi9-minimal: ✔✔✔✔✔✔
11/jre/centos: ✔✔✔✔✔✔
11/jre/alpine: ✔✔✔✔✔✔
11/jre/ubuntu/focal: ✔✔✔✔✔✔
11/jre/ubuntu/jammy: ✔✔✔✔✔✔
11/jdk/ubi/ubi9-minimal: ✔✔✔✔✔✔
11/jdk/centos: ✔✔✔✔✔✔
11/jdk/alpine: ✔✔✔✔✔✔
11/jdk/ubuntu/focal: ✔✔✔✔✔✔
11/jdk/ubuntu/jammy: ✔✔✔✔✔✔
17/jre/ubi/ubi9-minimal: ✔✔✔✔✔✔
17/jre/centos: ✔✔✔✔✔✔
17/jre/alpine: ✔✔✔✔✔✔
17/jre/ubuntu/focal: ✔✔✔✔✔✔
17/jre/ubuntu/jammy: ✔✔✔✔✔✔
17/jdk/ubi/ubi9-minimal: ✔✔✔✔✔✔
17/jdk/centos: ✔✔✔✔✔✔
17/jdk/alpine: ✔✔✔✔✔✔
17/jdk/ubuntu/focal: ✔✔✔✔✔✔
17/jdk/ubuntu/jammy: ✔✔✔✔✔✔
20/jre/ubi/ubi9-minimal: ✔✔✔✔✔✔
20/jre/alpine: ✔✔✔✔✔✔
20/jre/ubuntu/jammy: ✔✔✔✔✔✔
20/jdk/ubi/ubi9-minimal: ✔✔✔✔✔✔
20/jdk/alpine: ✔✔✔✔✔✔
20/jdk/ubuntu/jammy: ✔✔✔✔✔✔

OS support

This PR explicitely excludes Windows support, mostly because I lack expertise and an actual Windows installation to develop and test it. Additionally, it's unclear whether openjdk included CA certificate support for Windows.

TODO / Help needed

I need some guidance of the following items:

Fixes: #293

jerboaa commented 1 year ago

Please be sure to sign the ECA.

rassie commented 1 year ago

Already on it

gaeljw commented 1 year ago

As original requestor of the feature, I'm fine with the choice that has been made to replace certs rather than merge because merging can be done beforehand if needed and it's probably better to let the user choose the "merging strategy" (I'm not that familiar with these stuff but I heard that there's several ways to do it).

karianna commented 1 year ago

@rassie Can we set this to ready for review now?

rassie commented 1 year ago

@rassie Can we set this to ready for review now?

From my point of view, of course. I've expected a bit more discussion in terms of TODOs, but I suppose it can be done as part of the review.

karianna commented 1 year ago

@jerboaa Over to you for the first pass I think :-)

jerboaa commented 1 year ago

It's on my list of things to look at but likely not before end of next week. Sorry.

rassie commented 1 year ago

Please integrate the test script into the repo and use it from the PR tester workflow as an additional test.

I'll try to figure out how that works, but if I have some questions, is there someone specific I could ask, either here or on Adoptium Slack?

jerboaa commented 1 year ago

Paging @gdams for input on the test strategy. Thanks!

rassie commented 1 year ago

@jerboaa so it seems the test is being found and executed, so we've got the infrastructure part right. I'll look over the script error later today or tomorrow.

rassie commented 1 year ago

It seems the tests need to be in the tests subdirectory. Let's try this again...

rassie commented 1 year ago

Another fix for JRE/JDK 8 detection. The fix for Windows exclusion is still pending.

rassie commented 1 year ago

I think this should be it. :crossed_fingers:

karianna commented 1 year ago

The code changes LGTM me. I'll ask our (MSFT) infra and security folks to take a look at this as well. Sorry for the hold up but this is pretty critical stuff, more eyes are good :-)

rassie commented 1 year ago

@jerboaa on the topic of documentation: should I be updating https://github.com/docker-library/docs/blob/master/eclipse-temurin/content.md in some way?

jerboaa commented 1 year ago

@rassie Yes that seems suitable. But only once this PR is merged and builds have been produced. Feel free to produce a draft PR there, though. We'd also welcome a blog post here: https://github.com/adoptium/adoptium.net/tree/main/content

Thanks!

gdams commented 1 year ago

I'll do a thorough review today, also tagging @tianon and @yosifkit as they may have some comments on the proposed implementation

gdams commented 1 year ago

@rassie is it possible for you to add a GitHub Actions workflow to run your new tests against new pull requests?

jerboaa commented 1 year ago

@rassie is it possible for you to add a GitHub Actions workflow to run your new tests against new pull requests?

Isn't this already implemented? See the checks in this PR.

gdams commented 1 year ago

@rassie is it possible for you to add a GitHub Actions workflow to run your new tests against new pull requests?

Isn't this already implemented? See the checks in this PR.

ahh yes, I didn't realise it had been written in a way that it would integrate with the docker hub tests that are automatically run (very nice!) We may wish to contribute the tests upstream at some point rather than them sitting in our repo

rassie commented 1 year ago

I've opened https://github.com/docker-library/docs/pull/2338 for the container documentation. Very rough and small, but should do the trick.

gdams commented 1 year ago

Before this goes in I'd like to better understand what the impact is (if any) on existing users that are already using their own ENTRYPOINT? E.g what happens to the following user?

FROM eclipse-temurin
ENTRYPOINT ["/custom-script.sh"]
rassie commented 1 year ago

Before this goes in I'd like to better understand what the impact is (if any) on existing users that are already using their own ENTRYPOINT? E.g what happens to the following user?

FROM eclipse-temurin
ENTRYPOINT ["/custom-script.sh"]

They would still overwrite the entrypoint and only run their own code on startup. They wouldn't be able to opt-in via the environment variable, but they can choose to run our entrypoint additionally from their entrypoint, in which case they gain CA certificate merging functionality.

If you want, I can add a couple of tests mimicking exactly that.

gdams commented 1 year ago

If you want, I can add a couple of tests mimicking exactly that.

I think some tests for this use-case would be useful to ensure we don't break things going forwards

rassie commented 1 year ago

@gdams I've added a test ensuring that opt-in does nothing if the entrypoint is overridden. Hope it passes in CI (it does locally).

rassie commented 1 year ago

Added another use-case while at it (environment variable set without mounted certificates)

rassie commented 1 year ago

I've fixed the build/test error and also tightened cleanup and pre-checks (now also checking whether the certificates directory exists and is not empty).

rassie commented 1 year ago

Oh, need to update the generated files still. Coming right up...

gdams commented 1 year ago

adding the pmc-agenda label to get one last sanity check before merging

rassie commented 11 months ago

@gdams Thanks, fingers crossed!

tellison commented 11 months ago

@rassie the PMC wishes to pass on thanks for your work on this new feature. We realise that it has been open for a long time, and recognise that it has been well thought through and provides thorough local testing. This is a great new capability for the official Temurin images and as gdams noted, once this has been validated on one Java version we will roll it out to all.

Thanks again 👍

QuintenQVD0 commented 11 months ago

This pr broke all images that depends on this. as we have our own entrypoint and are using a low privileged user. thx

rassie commented 11 months ago

This pr broke all images that depends on this. as we have our own entrypoint and are using a low privileged user. thx :(

Care to elaborate? What exactly is breaking when you override the entrypoint? I think I've tested that use-case when developing and it worked fine, IIRC, would fix ASAP if I understand the problem correctly.

QuintenQVD0 commented 11 months ago

This pr broke all images that depends on this. as we have our own entrypoint and are using a low privileged user. thx :(

Care to elaborate? What exactly is breaking when you override the entrypoint? I think I've tested that use-case when developing and it worked fine, IIRC, would fix ASAP if I understand the problem correctly.

We have fixt it but your entryoint.sh is root owned and we can not overwrite it directly as our images are run by a low priv user called container. example:

https://github.com/parkervcp/yolks/tree/master/java/11

because the base image is yours and you have some logic in the entryoint of you as we do not overwrite it as we use CMD and coppy in our own entrypoint. when we start are containers it tries to run your entrypoint but it is root owned so it will fail. we got arround it but we just wanted to let you know as not every use for this as a base image will fit everyone.

rassie commented 11 months ago

We have fixt it but your entryoint.sh is root owned and we can not overwrite it directly as our images are run by a low priv user called container. example:

Oh, so we have a clash with filenames, i.e. we've introduced /entrypoint.sh which you have been using until now? It's rather unfortunate, but I can understand how it came to be and it is certainly unexpected (both for user and developer), I agree.

because the base image is yours and you have some logic in the entryoint of you as we do not overwrite it as we use CMD and coppy in our own entrypoint. when we start are containers it tries to run your entrypoint but it is root owned so it will fail. we got arround it but we just wanted to let you know as not every use for this as a base image will fit everyone.

I've tried running eclipse-temurin with an unprivileged user and had no problems calling the entrypoint script. Can you show me how it fails?

In general, thank you for the feedback, is there anything you'd have us do differently with this feature?

gdams commented 11 months ago

We have fixt it but your entryoint.sh is root owned and we can not overwrite it directly as our images are run by a low priv user called container. example:

Oh, so we have a clash with filenames, i.e. we've introduced /entrypoint.sh which you have been using until now? It's rather unfortunate, but I can understand how it came to be and it is certainly unexpected (both for user and developer), I agree.

because the base image is yours and you have some logic in the entryoint of you as we do not overwrite it as we use CMD and coppy in our own entrypoint. when we start are containers it tries to run your entrypoint but it is root owned so it will fail. we got arround it but we just wanted to let you know as not every use for this as a base image will fit everyone.

I've tried running eclipse-temurin with an unprivileged user and had no problems calling the entrypoint script. Can you show me how it fails?

In general, thank you for the feedback, is there anything you'd have us do differently with this feature?

@rassie I would suggest renaming our custom entrypoint.sh script as it's the most likely name to cause conflict? What do you think?

dashford commented 11 months ago

Hi @rassie - have you any thoughts on this issue posted here? https://github.com/adoptium/containers/issues/415

We're running into issues on some of our applications where env variables with a - in them have stopped returning data. We're thinking it's because of the change in shell now that the entrypoint script is defined in the Dockerfile and is setting it as /usr/bin/sh rathaer than the default shell for ubuntu:22.04.

rassie commented 11 months ago

@rassie I would suggest renaming our custom entrypoint.sh script as it's the most likely name to cause conflict? What do you think?

Would be one the easier solutions, however it's a breaking change, even though it's only been a couple of weeks. Either way, someone from the project will need to decide how to go on.

gdams commented 11 months ago

@rassie I would suggest renaming our custom entrypoint.sh script as it's the most likely name to cause conflict? What do you think?

Would be one the easier solutions, however it's a breaking change, even though it's only been a couple of weeks. Either way, someone from the project will need to decide how to go on.

I think that where we've already created a breaking change I'm going to back this out from the upstream images (for now) so we can work on a fix here.

rassie commented 11 months ago

Hi @rassie - have you any thoughts on this issue posted here? #415

We're running into issues on some of our applications where env variables with a - in them have stopped returning data. We're thinking it's because of the change in shell now that the entrypoint script is defined in the Dockerfile and is setting it as /usr/bin/sh rathaer than the default shell for ubuntu:22.04.

From that issue:

The new entrypoint from https://github.com/adoptium/containers/pull/392 is sh on Ubuntu and Alpine images and so loses variables. Please change all the entrypoint scripts to use bash

I have no problem with changing to bash, however, sh has been the common ground for all Linux derivates because of Alpine, which doesn't have bash installed by default. Should we just install bash to Alpine and be done with it?

rassie commented 11 months ago

@gdams Thanks, gives us a bit of time. I'll prepare a PR with moved entrypoint script (somewhere with little chance of name clash) and with bash running the entrypoint, correct?

gdams commented 11 months ago

@gdams Thanks, gives us a bit of time. I'll prepare a PR with moved entrypoint script (somewhere with little chance of name clash) and with bash running the entrypoint, correct?

yes please

QuintenQVD0 commented 11 months ago

We have fixt it but your entryoint.sh is root owned and we can not overwrite it directly as our images are run by a low priv user called container. example:

Oh, so we have a clash with filenames, i.e. we've introduced /entrypoint.sh which you have been using until now? It's rather unfortunate, but I can understand how it came to be and it is certainly unexpected (both for user and developer), I agree.

because the base image is yours and you have some logic in the entryoint of you as we do not overwrite it as we use CMD and coppy in our own entrypoint. when we start are containers it tries to run your entrypoint but it is root owned so it will fail. we got arround it but we just wanted to let you know as not every use for this as a base image will fit everyone.

I've tried running eclipse-temurin with an unprivileged user and had no problems calling the entrypoint script. Can you show me how it fails?

In general, thank you for the feedback, is there anything you'd have us do differently with this feature?

thank you for your fast and grate reesponse, this is what the error is environment/docker: failed to start container: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/entrypoint.sh": permission denied: unknown

as the container is forced to start as a low prive user called "container" and that file is indeed a naming conflict so it tryes to execute you entrypoint what is root owned or at least conflict with ours as our CMD entry specifyes the same file