Closed dimityrmirchev closed 9 months ago
@dimityrmirchev Thanks a lot for that change. The changes look good to me, though I'm not really a VPN expert. I've just one question to the changes.
Why would we want to support AES-128 when we always used AES-256 in the past? This part does not look like a security improvement. Please explain.
Sure. As of now the default values in openvpn for the data-ciphers
are as follow:
AES-256-GCM:AES-128-GCM:CHACHA20-POLY1305
if there is support for CHACHA20-POLY1305
AES-256-GCM:AES-128-GCM
otherwiseAs stated in the docs the server will negotiate the first cipher that is also supported by the client. In our case this is AES-256-GCM
. My guess is that hardware that supports AES-128-GCM
should support AES-256-GCM
as well, however I did not find any concrete examples of that. This is why I explicitly set what openvpn
uses as default. As per the debate 256 vs 128 here is a nice explanation why the go team chooses 128 before 258 https://go.dev/blog/tls-cipher-suites (search for "AES-128 is preferred over AES-256 for encryption."). But maybe I misunderstood that statement and that is only valid in the context of TLS 🤔 I am perfectly fine with removing the AES-128-GCM
option since it seems that this is the consensus.
If we make that change we need to allow both, aes-256-gcm and aes-256-cbc on the server (1)
I agree.
with the next release change the clients to -gcm (2), and then with the next release remove aes-256-cbc from the server(3).
Can't we directly change the clients to -gcm? The server will rollout before the clients are updated.
@ScheererJ @marwinski PTAL, concerns should be addressed in the latest commits and comments.
I tried an upgrade with the latest changes and it seems that the auth SHA256
config will cause the client failing to connect until its configuration is also updated. I will try to think of a way to apply this without breaking the connection.
/hold
I tried an upgrade with the latest changes and it seems that the auth SHA256 config will cause the client failing to connect until its configuration is also updated. I will try to think of a way to apply this without breaking the connection.
I did some more testing and it seems that a graceful update is not possible in the current setup if we keep the auth SHA256
config. As a general question I would ask if such non compatible server/client changes were applied before and is there any strategy for such changes? I would expect that sooner or later such changes will happen even if we remove the auth SHA256
lines from the current PR.
I fear there is no strategy or simple solution for such non compatible changes. I just stumbled over https://github.com/gardener/gardener/issues/7471 @hendrikKahl What is your opinion? What will be the impact of a VPN outage during update?
🤔 the only situation, where the things "really" got stuck was during the update of a shoot to an HA vpn. There, the deployment
is deleted and replaced by a statefulset
. Now, when the cluster has CRD conversion webhook, that is invoked by KCM garbage collection, it breaks the cluster in a specific way:
deployment
and replicaset
are stuck with a foregroundDeletion
finalizer that cannot get removed because KCM GC is stuckmanagedresource
is stuck, because it waits for the old resources to be gone before creating new onesThe only solution there is to manually purge the finalizers.
Now, I'm not sure if the same issue would occur, if a cluster is already on a HA VPN.
The best solution I can think of, would be to ensure that GRM does not wait for the deletion. Either the deletion mode becomes configurable and for the VPN it is changed to background or the rollout behavior is adapted so it does not wait.
Can you please reevaluate this PR? The mentioned concerns were addressed/explained. In this form this PR introduces a backwards incompatible change to the server https://github.com/gardener/vpn2/pull/53/files#diff-1157b3771de832487f583c709a87a57eb0755d3102a8cf93c7422a6daae7ac87R131 which will cause a short downtime during upgrade.
I tested the rollout to the new version with the HA setup and the non HA setup together with a failing conversion webhook as described in https://github.com/gardener/gardener/issues/7471. Even if the KCM garbage collection is stuck, the VPN rollout can be completed without manual intervention.
@dimityrmirchev, The VPN downtime should be mentioned in the release notes.
@axel7born Thanks for reviewing. I mentioned the short downtime that is expected during the upgrade in the release note.
/lgtm
/unhold
What this PR does / why we need it: This PR introduces some security improvements to the openvpn configuration.
--cipher
is no longer set. This option should not be used any longer in TLS mode as stated in theopenvpn
docs.--data-ciphers
now reflects theopenvpn
defaults and uses Galois/Counter Mode--dh
Diffie-Hellman key exchange is disabled in favour of elliptic curve Diffie-Hellman. ECDH parameters are handled automatically by openssl (ref)--auth
is set to SHA256. Previously it was defaulted to SHA1.See more information on the official docs page https://openvpn.net/community-resources/reference-manual-for-openvpn-2-6/ and the following README document https://github.com/OpenVPN/openvpn/blob/master/README.ec
Logs from
vpn-shoot-client
before the changes (log verbosity increased by applying the configurationverb 4
).Logs from
vpn-shoot-client
after the changes (log verbosity increased by applying the configurationverb 4
).Mind that if we enforce TLS 1.2 by setting
tls-version-max 1.2
in the server config we can see that ECDH key exchange is enforced.Which issue(s) this PR fixes: Fixes #
Special notes for your reviewer:
Release note: