concourse / concourse-bosh-release

Concourse BOSH release
Apache License 2.0
28 stars 49 forks source link

concourse_tls_ca_cert issue while upgrading from v6.7.5 to v7.1.0 #144

Closed divyamudundi closed 3 years ago

divyamudundi commented 3 years ago

Hi there!

Bug Report

We have upgraded from concourse V6.7.5 to v7.1.0 and post the upgrade accessing the concourse-web was returning the following error:

didnt accept your login certificate, or one may not have been provided Try contacting the system domain ERR_BAD_SSL_CLIENT_AUTH_CERT Upon this looking into the concourse-bosh-release commit: https://github.com/concourse/concourse-bosh-release/commit/2bce6838f74d56abafba56912e16152f4c5ab944 . We removed the CONCOURSE_TLS_CA_CERT from '/var/vcap/jobs/web/config/bpm.yml' and restarted the web process - 'monit web restart'. Post the restart , we were able to access concourse. Following are the error's seen in web node stderr.log: 2021/03/26 10:34:55 http: TLS handshake error from 172.26.126.205:52072: tls: client didn't provide a certificate 2021/03/26 10:34:56 http: TLS handshake error from 172.26.126.205:52073: tls: client didn't provide a certificate 2021/03/26 10:34:58 http: TLS handshake error from 172.26.126.205:52074: tls: client didn't provide a certificate 2021/03/26 10:34:59 http: TLS handshake error from 172.26.126.205:52075: tls: client didn't provide a certificate 2021/03/26 10:35:00 http: TLS handshake error from 172.26.126.205:52076: tls: client didn't provide a certificate 2021/03/26 10:35:01 http: TLS handshake error from 172.26.126.205:52077: tls: client didn't provide a certificate 2021/03/26 10:35:02 http: TLS handshake error from 172.26.126.205:52079: tls: client didn't provide a certificate NOTE: We did not set up anything about client cert authentication in the config. The following can also be handy: * Concourse version: v7.1.0 * Deployment type (BOSH/Docker/binary): BOSH * Infrastructure/IaaS: Vsphere * Browser (if applicable): Chrome * Did this used to work?: Yes
bg-govau commented 3 years ago

We experienced this as well when we went to v7.1.0.

I think whats happening is that if you are using the ops files from concourse-bosh-deployment to add tls to your web instance_group:

https://github.com/concourse/concourse-bosh-deployment/blob/5a9d729e9cdfd6a67fbfe91d1a9da322487d6ec0/cluster/operations/tls-vars.yml#L10-L18

https://github.com/concourse/concourse-bosh-deployment/blob/5a9d729e9cdfd6a67fbfe91d1a9da322487d6ec0/cluster/operations/tls.yml#L1-L3

Then the atc_tls cert has a ca part in credhub, which means CONCOURSE_TLS_CA_CERT is set to the ca, making concourse then expect a client cert signed by that ca.

In our case - we were using a reverse proxy in front of concourse, so for now we've just configured it to do what concourse now wants and send a valid client cert to the concourse web backend.

If you're not using a reverse proxy and are accessing concourse web directly, I'd assume the normal practice would be to use a real cert for atc_tls, in which case people might not have the ca part set when uploading it to credhub, and it might be all good.

It kind of feels like to me it'd be simpler for operators to allow them to explicitly enable requiring client cert in a separate property, instead of just toggling based on the presence of the ca part.

divyamudundi commented 3 years ago

Thank you @bg-govau. We dont have a reverse proxy in front of our concourse-web so we are using a custom-ops file to have the atc_tls ca removed from the web-nodes manifest for the 7.1.x deployment.

pivotal-madeline-preston commented 2 years ago

Experienced this as well. Our fix:

bosh -d credhub manifest > credhub.yml

Modified manifest from

tls:
        bind_port: 443
        cert: ((atc_tls))

To

tls:
        bind_port: 443
        cert:
          certificate: ((atc_tls.certificate))
          private_key: ((atc_tls.private_key))

bosh -d concourse deploy concourse.yml