Closed raoofm closed 7 years ago
one of the members started as
/var/lib/etcd/etcd -discovery-srv example.com -advertise-client-urls https://node02.example.com:2379 -initial-cluster-state new -name node02 -data-dir /var/lib/etcd/cluster/datadir -listen-peer-urls https://node02.example.com:2380 -initial-advertise-peer-urls https://node02.example.com:2380 -listen-client-urls https://node02.example.com:2379 -trusted-ca-file /var/home/rm/cfssl/ca.pem -cert-file /var/home/rm/cfssl/member2.pem -key-file /var/home/rm/cfssl/member2-key.pem -peer-trusted-ca-file /var/home/rm/cfssl/ca.pem -peer-cert-file /var/home/rm/cfssl/member2.pem -peer-key-file /var/home/rm/cfssl/member2-key.pem -heartbeat-interval 200 -election-timeout 2000 -initial-cluster-token etcd-cluster-dev >>/var/log/etcd/etcd.out 2>&1 &
proxy is compatible but it fails if ssl is enabled.
Do you mean etcd
fails or v2 proxy fails?
x509: certificate signed by unknown authority
Are these certs valid? Have you tried setting up simple single node to see if it works?
There was some TLS tightening in 3.0 (e.g., all the SRV fixes) and a TLS version upgrade to 1.2. It's possible that 2.3.x was too permissive about certs so now it's rejecting in 3.0.17 like it should have in the first place.
++ @xiang90 @philips @gyuho I mean v2 proxy fails to connect to v3 etcd. Certs are valid. 3 members started fine. I mentioned the flags used to start v3 etcd. I also mentioned old etcd proxy flags that fails and new with certs that works for proxy.
@heyitsanthony I totally understand that but I think this needs to be loosened atleast for clients coming via v2 proxy on a v3 cluster as it is breaking existing clients. We use this in production and our set up is described at https://github.com/coreos/etcd/blob/master/Documentation/production-users.md#vonage
We have n proxies running alongside each app on the same machine. It is a huge impact, we can't figure out and plan to upgrade all proxies (or restart with new flags).
This is a stopper for us to upgrade etcd from v2 to v3. We really wanted to use v3 with vault.
I would say a flag to enable this behavior can be introduced and turning it off by default. Though I agree security is important but here there is no route to smoothly upgrade. Tons of clients would break. We shouldn't break existing clients.
Even https://github.com/coreos/etcd/blob/master/Documentation/v2/proxy.md doesn't talk about setting trusted ca.
Thoughts?
@raoofm OK. Just to confirm:
etcdserver: could not get cluster response from https://node01.example.com:2380: Get https://node01.example.com:2380/members: x509: certificate signed by unknown authority
when trying to connect to 3.0.17@heyitsanthony yes
@raoofm, @gyuho and I were able to reproduce this on our side. A fix that appears to work out of the box is to append the etcd ca cert to the system certs file (usually /etc/ssl/certs/ca-certificates.crt). If this isn't an option, we can investigate a patch for 3.0.17.
@heyitsanthony @gyuho appending to system certs worked, but it is the same thing as adding the flag and restarting etcdProxy. Bcoz even after appending the certs to ca, proxy still needed a restart.
This mean all the proxies needs to be reached out and reconfigured/restarted before upgrade - which shouldn't have been the case.
The positive hope on this is whether the existing proxy machines already has this ca as part of ca-bundle. We have roughly about 150 proxies deployed. This is an action item for me to verify. Can you guys think about other options?
OK, it wasn't clear the proxies couldn't be restarted at all.
This mean all the proxies needs to be reached out and reconfigured/restarted before upgrade - which shouldn't have been the case.
Except the TLS configuration was wrong from the start. The proxy shouldn't have been able to connect to etcd without checking the authenticity of the certs. At this point it's about trying to figure out how to work around this broken configuration which should have never worked in the first place.
Can you guys think about other options?
We can investigate a server-side patch so the authentication check won't happen.
Except the TLS configuration was wrong from the start. The proxy shouldn't have been able to connect to etcd without checking the authenticity of the certs. At this point it's about trying to figure out how to work around this broken configuration which should have never worked in the first place.
agree
@raoofm it appears what looked like was a repro either was not or it worked then was lost; 2.3.8 proxy / 2.3.8 etcd is always giving a cert error when the proxy does not have access to the etcd ca. Is there a reliable way to reproduce this from scratch?
ok I'll try and mention the steps
@heyitsanthony @gyuho luckily for us on qa, preprod and prod the ca signing the certs is already part of the default ca-bundle and the upgrade was success (not on dev and that is why we see this issue while testing the upgrade, sorry).
we are currently upgrading to 3.1.3 and can test out how it plays with vault and keep you guys posted.
Thanks much for your support guys. This issue can be closed.
As discussed in #7344, I agree that proxy is compatible but it fails if ssl is enabled.
Our scenario v2.3.8 cluster is ssl enable and after upgrading to v3.0.17 the v2.3.8 proxy fails with certificate error.
2017-03-07 21:17:45.736054 W | etcdserver: could not get cluster response from https://node01.example.com:2380: Get https://node01.example.com:2380/members: x509: certificate signed by unknown authority
Existing config /var/lib/etcdProxy/etcd -data-dir /var/lib/etcdProxy/proxy/datadir -listen-client-urls http://localhost:2377 -discovery-srv example.com -proxy on
Modification that works. /var/lib/etcdProxy/etcd -data-dir /var/lib/etcdProxy/proxy/datadir -listen-client-urls http://localhost:2377 -peer-trusted-ca-file /var/home/rm/cfssl/ca.pem -discovery-srv example.com -proxy on