edgelesssys / constellation

Constellation is the first Confidential Kubernetes. Constellation shields entire Kubernetes clusters from the (cloud) infrastructure using confidential computing.
GNU Affero General Public License v3.0
929 stars 48 forks source link

helm: Restore the ability to start a cluster in conformance mode by disabling the cilium ipmasq agent when in conformance mode #3062

Closed 3u13r closed 3 months ago

3u13r commented 3 months ago

Context

Proposed change(s)

I tested this on AWS.

Checklist

netlify[bot] commented 3 months ago

Deploy Preview for constellation-docs canceled.

Name Link
Latest commit 9f52d3cc8716a5f0a485b5c0eeaaa65ab116e940
Latest deploy log https://app.netlify.com/sites/constellation-docs/deploys/663b9b064ae2200008f993df
miampf commented 3 months ago

I currently get the following error when trying to set up a cluster on GCP: Error: applying Helm charts: applying constellation-operators: helm install: release constellation-operators failed, and has been uninstalled due to atomic being set: client rate limiter Wait returned an error: context deadline exceeded

burgerdev commented 3 months ago

It looks like you're not only disabling the ip-masq-agent, but also the entire eBPF service resolution mechanism (kubeProxyReplacement) - could you please document why this is necessary?

3u13r commented 3 months ago

could you please document why this is necessary?

What kind of documentation do you want? Do you want a user-facing documentation of the --conformance flag? We need to disable kube-proxy replacement since we must use the portmap feature of Cilium go gain conformance regarding the differentiation of UDP and TCP traffic on the same port. The idea is that nobody requires (or maybe even uses) that feature since upstream Cilium in their default configuration is also not compliant. They want to fix this eventually (since ~2 years or so). There have been recent efforts which were abandoned even more recently.

3u13r commented 3 months ago

I currently get the following error when trying to set up a cluster on GCP:

Hm, I don't get any error when initializing a Constellation on gcp.

burgerdev commented 3 months ago

We need to disable kube-proxy replacement since we must use theportmap feature of Cilium go gain conformance regarding the differentiation of UDP and TCP traffic on the same port.

This kind of documentation, as.a comment right here where we set it. Future readers may be interested in the reason for the different configuration in conformance mode, or whether/when it's ok to change it back.

I still don't understand why we need to go from partial replacement to no replacement, unless the previous configuration (minus ip-masq-agent) never worked to begin with.

3u13r commented 3 months ago

I still don't understand why we need to go from partial replacement to no replacement,

Using partial is deprecated and using false is essentially the partial option now, see: https://docs.cilium.io/en/stable/network/kubernetes/kubeproxy-free/#kube-proxy-hybrid-modes

comment right here where we set it.

Will do hopefully later today.

burgerdev commented 3 months ago

I deployed this branch on GCP without problems.

burgerdev commented 3 months ago

However, the conformance test did not pass:

   - name: '[It] [sig-network] Services should serve endpoints on same port and different
        protocols [Conformance]'
      status: failed
3u13r commented 3 months ago

Got a successful run with K8s 1.28 on Azure.

github-actions[bot] commented 3 months ago

Coverage report

Package Old New Trend
internal/constellation/helm 33.60% 33.80% :arrow_upper_right: