concourse / concourse-bosh-deployment

A toolchain for deploying Concourse with BOSH.
Apache License 2.0
84 stars 155 forks source link

mTLS unintentionally(?) enabled when deploying v7.0.0+ #233

Closed davewalter closed 3 years ago

davewalter commented 3 years ago

While testing the latest patch releases of bosh-bootloader for compatibility with Concourse, I recently discovered that the concourse-smoke-tests originally written by @joshzarrabi started failing when the pipeline started consuming v7.0.0. The error occurred in the ci task when setting up the test environment:

+ wget --no-check-certificate -O fly 'https://35.226.174.115/api/v1/cli?arch=amd64&platform=linux'
--2021-02-19 17:39:45--  https://35.226.174.115/api/v1/cli?arch=amd64&platform=linux
Connecting to 35.226.174.115:443... connected.
WARNING: Could not save SSL session data for socket 4
WARNING: The certificate of '35.226.174.115' is not trusted.
WARNING: The certificate of '35.226.174.115' doesn't have a known issuer.
Failed writing HTTP request: The specified session has been invalidated for some reason..
Retrying.

When I watched the web process' logs while making the same request manually, I saw this in the web.stderr.log:

2021/04/30 22:05:00 http: TLS handshake error from 1.2.3.4:50417: tls: client didn't provide a certificate

Reading through the release notes, the only reference I could find to a client TLS certificate came from the introduction for support for mTLS:

* Support for mTLS (#6355) @nickhyoti
  * Added support for mTLS between Concourse and a reverse proxy that may be in front of Concourse

Working backwards through the changes made in that PR (https://github.com/concourse/concourse/pull/6355), I found that if a CA certificate is included in the ATC's tls configuration then that will enable mTLS, which will require a client certificate to be included in requests. This happens automatically if the tls.cert.ca property is included in the job spec for the web instance in the BOSH manifest. This, in turn, happens automatically since the atc_tls variable defined in tls-vars.yml is used in its entirety in the tls.yml ops-file.

I have confirmed that excluding the CA from the web instance group's tls.cert property and redeploying fixes the problem and allows me to download the fly CLI from the server. I would be happy to PR this change if it would be acceptable. I could also include a new mtls.yml ops-file that includes the tls.cert.ca property for users that want to enable mTLS on their web VMs.

taylorsilva commented 3 years ago

A PR for this would be very welcomed! Thanks for the investigative work you put in to figure this out 👏

davewalter commented 3 years ago

Thanks for the quick response @taylorsilva. We created #234 with the changes we think are required.

taylorsilva commented 3 years ago

Fixed by #234