giantswarm / roadmap

Giant Swarm Product Roadmap
https://github.com/orgs/giantswarm/projects/273
Apache License 2.0
3 stars 0 forks source link

CNP for HTTP proxy access #3238

Open vxav opened 4 months ago

vxav commented 4 months ago

We need a CNP for proxy access but not a CCNP.

  egress:
    - toCIDRSet:
        - cidr: 10.88.96.254/32
      toPorts:
        - ports:
            - port: "3128"

e.g. Trivy scan vulnerability report: https://gigantic.slack.com/archives/CE92C4BST/p1707897722420569?thread_ts=1707895806.213459&cid=CE92C4BST

gawertm commented 4 months ago

@weatherhog this was a finding of a recent support case with panamax as their environment is behind a proxy

vxav commented 4 months ago

I don't know what the best implementation would be but, since customers will all have different proxy addresses, my first thought would be either a kyverno policy that replicate a CNP to e.g. 10.88.96.254:3128 into our namespaces or add a CNP to world:3128 in our charts (more open of course).

weatherhog commented 4 months ago

@gawertm can you explain this a little more? The only Proxy we in Cabbage own is the Oauth2-Proxy. If this is about squid proxy this should go to Pheonix. Or is this about trivy, then it should go to Shield

vxav commented 4 months ago

It's about a CiliumNetworkPolicy to reach the HTTP proxy (squid is one of them). Why would it go to Phoenix though?

weatherhog commented 4 months ago

From my point of view here are two possible solutions. This gets done by the owner of the proxy, in the squid case its Team Phoenix or it is a policy that is customer specific then it should be solved by the corresponding AE/SA for Panamax.

The point is, yes we in Cabbage own Cilium but we do not own specific Cilium Network Policies for Apps that we do not own. Just like Atlas owns Monitoring but each Team is responsible for the monitoring of their own apps.

vxav commented 4 months ago

I suppose by Squid proxy you mean squid-proxy-app. This app is only used by Phoenix to simulate a customer environment but it is not in the scope in this particular issue.

Here, in private environments, the customer requires all communications to go through an HTTP proxy that is managed by them (we only get an IP and a TCP port) to add as env variable in our apps. This part is handled by team ownerships for each component or by Kyverno policies. For customer workloads, they deal with it themselves.

Reaching the HTTP proxy however is a "regular" network communication outside of the cluster "x.x.x.x:3128". In customer namespaces there shouldn't be an issue as there's no CNP by default. But in "Giant Swarm" namespaces, where CNPs are applied to the workloads, traffic to the proxy will be blocked. For that reason it seemed to me like a Cabbage thing but maybe I'm wrong. I will let the POs sort it among themselves :)

gawertm commented 4 months ago

hey, sorry I missed that one end of last week. @weatherhog that a customer has all the machines and outgoing communication through a http proxy is something we see right now at Panamax (not 100% sure about other customers), but I know we also had it at Talanx on AWS. So the proxy is always owned by the customer and it's not really provider specific. It's true that more onprem customers will have such setups but not only.

We discussed in Rocket again and the implementation itself will be rather easy / quick and we wouldn't mind doing that. However, the biggest task here is to define where and how we would deploy those CNPs for proxy access. As this will be a design decision we still think that someone from Cabbage should at least be involved as you will eventually also be responsible for most of the CNPs.

What do you think of a ~30 minute meeting with engineers from cabbage who will work on CNPs? This way you can also learn about the proxy use case in general for other topics

weatherhog commented 4 months ago

@kopiczko are you up for a meeting with Rocket to figure out which would be the best way to deploy CNPs?

kopiczko commented 4 months ago

I'm happy to attend. @gawertm please drop something in the calendar. In the meantime, maybe putting the required policies under a conditional + some settings in network-policies-app is a solution? I see it's deployed cluster-vsphere so it's maybe only a matter of extending network-policies-app and adding some extra settings to cluster-vsphere?

vxav commented 4 months ago

Current idea is to add a loop through a list of namespace to create the CNPs and feed the proxy settings from the secret.

glitchcrab commented 3 months ago

see https://github.com/giantswarm/network-policies-app/pull/7/

mcharriere commented 2 months ago

In CAPA it's been implemented as shown here: https://github.com/giantswarm/cluster-aws/blob/main/helm/cluster-aws/templates/_network-policies_helmrelease_config.yaml#L7

kopiczko commented 1 month ago

We talked with @vxav today and here is the summary.

How network-policies-app is installed and configured:

The app is installed with cluster-cloud-director https://github.com/giantswarm/cluster-cloud-director/blob/v0.52.1/helm/cluster-cloud-director/templates/netpol-helmrelease.yaml#L36 and configured there.

Where is proxy configuration coming from?

It's coming from cluster-apps-operator which is configured here https://github.com/giantswarm/cluster-apps-operator/blob/v2.22.0/helm/cluster-apps-operator/values.yaml#L19-L20

It creates two secrets:

The problem we have:

We need to somehow pass http(s) proxy values to network-policies-app:

https://github.com/giantswarm/network-policies-app/blob/b05fa60cf13fb53cac9313ac42f15d81e440d519/helm/network-policies-app/values.yaml#L25-L26

allowEgressToProxy:
  enabled: ture
  httpProxy: "..."
  httpsProxy: "..."

We can't use any of the secrets produced by cluster-apps-operator because:

Potential solutions:

  1. Move proxy configuration to cluster-cloud-director values.yaml (my preffered). It would be the same as in cluster-aws.
  2. Create yet another permutation of http(s) proxy values in <cluster>-cluster-values or create yet another secret with cluster-apps-operator

I believe this should be taken over by Rocket at this point. I'll move this ticket to Rocket board. Please discuss in the team and let us know if you think otherwise.

vxav commented 2 days ago

Ok I'm finally going to look into this.

The next thing I want to try is to rewrite the keys in the helmrelease to match the app's schema Example: https://github.com/giantswarm/cluster-cloud-director/blob/main/helm/cluster-cloud-director/templates/cloud-provider-cloud-director-helmrelease.yaml#L46-L47

EDIT: That won't work as those secrets contain yaml and that feature ☝ takes a key.