Configure a single general purpose and more permissive network policy

consideRatio commented 2 years ago

Issue update

We currently look to try block all outbound egress besides to local IP ranges and the cloud metadata server.

@sgibson91 and I debugged a failure to make an outbound access from a user pod to an FTP server exposed via port 21.

We pinpointed the issue to be that we override the z2jh Helm chart's default value for singleuser.networkPolicy.egress. The Z2JH default is very permissive, it allows all outbound traffic except for traffic to a specific IP (169.254.169.254) associated with a so called cloud metadata server.

By overriding singleuser.networkPolicy.egress, we are reducing the egress permissions to exclude outbound FTP access.

Discussion points

Do we want to retain this opt-in system to allow ports?
1. If we want to retain this opt-in system to allow more egress ports, how to we make users aware about this, and how to we manage such opt-in requests?

Reference

Tasks

[x] https://github.com/2i2c-org/infrastructure/pull/774
[x] https://github.com/2i2c-org/infrastructure/pull/782
[x] https://github.com/2i2c-org/infrastructure/pull/788
Document our network policies:
- [x] for Hub Administrators: https://github.com/2i2c-org/pilot/pull/109
- [x] and for 2i2c Hub Engineers: https://github.com/2i2c-org/infrastructure/pull/783

Discussion captured from video conv

@choldgraf mention it will be perceived as something is broken by users who doesn't understand this ahead of time.

yuvipanda commented 2 years ago

https://github.com/berkeley-dsep-infra/datahub/issues/2825 has more details on the FTP situation, and in https://github.com/berkeley-dsep-infra/datahub/issues/2825 there's discussion about just removing the outbound port restrictions. I think my inclination is that as long as we don't allow arbitrary access inside the cluster (where the user server can just hit the prometheus server, for example!) port restrictions are pointless. So if we can allow unrestricted access to the internet but not local cluster network we're all good.

choldgraf commented 2 years ago

So if we can allow unrestricted access to the internet but not local cluster network we're all good.

This sounds like a good approach to me - most people that users assume that "everything is possible", so if we restrict things then it'll usually be confusing to folks. So I'd be +1 on making most outbound traffic possible by default and the action would be a community rep asking for something to be restricted for some reason.

sgibson91 commented 2 years ago

So if we can allow unrestricted access to the internet but not local cluster network we're all good.

If I understand correctly, this is z2jh's default setting. So the implementation of this would look like a PR that removes all the networkPolicy keys from under singleuser in basehub and daskhub, right?

yuvipanda commented 2 years ago

@sgibson91 I think so, but I want to see how it interacts with non Jupyterhub things in cluster. Primarily, I want to verify that we can't hit the Prometheus or grafana servers from user pods, as that can leak information. There is no password auth or anything for Prometheus, so we have to rely on network policy here.

sgibson91 commented 2 years ago

One way to test would be to spin up a test cluster with the support chart and a hub with all ports open and see if we can hack it

consideRatio commented 2 years ago

I think the right call may be to have tight allowances on what is allowed to access prometheus rather than specific exceptions for what the user pods cant submit traffic to unless they are general, like no local ip ranges.

So one idea is to set a netpol saying allow 0.0.0.0/0 (all) except: and list 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 (private ip ranges) and the cloud metadata server also listed in thr z2jh defaults (169.254.169.254). Thats a simple rule that may work reliably without exceptions, or not.

z2jh defaults is to allow all egress but to that cloud metadata server ip. Note that i dont know much about that and are afraid it may be an overloaded word used for different things - but i refer to the ip blocked by z2jh by default: 169.254.169.254.

yuvipanda commented 2 years ago

@sgibson91 I think an easier way is to just test this on the pilot hubs cluster by locally doing a deploy just to the staging hub, starting a server there, and trying to access the prometheus server. Even better would be to run nmap from inside the user server, and scan everywhere to see what it can hit.

sgibson91 commented 2 years ago

The config I locally deployed to 2i2c-staging to test this is represented in this draft PR: https://github.com/2i2c-org/pilot-hubs/pull/746

I then spun up my user server, opened a terminal and tried to ssh to the prometheus server IP address (that I got from kubectl -n support get svc). The result was that the connection timed out.

I also tried to run nmap as well but it wasn't available and I had trouble installing it.

damianavila commented 2 years ago

I also tried to run nmap as well but it wasn't available and I had trouble installing it.

I guess you may need to use an image with nmap already installed...

damianavila commented 2 years ago

So one idea is to set a netpol saying allow 0.0.0.0/0 (all) except: and list 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 (private ip ranges) and the cloud metadata server also listed in thr z2jh defaults (169.254.169.254). Thats a simple rule that may work reliably without exceptions, or not.

This is an interesting idea...

yuvipanda commented 2 years ago

Thanks for testing this, @sgibson91!

I looked at which ports the prometheus service exposes with k -n support get svc:

support-prometheus-server                    ClusterIP      10.0.21.196   <none>         80/TCP                       2y40d

so that's port 80, while ssh only tests port 22. So you can test it with curl, which I think should be installed - curl http://<service-ip>:80.

We should also check if we can access the pod directly! By looking at k -n support get pod -l component=server,app=prometheus -o yaml, I can get its IP (.status.podIP) as well as the container port where the server is exposed (in this case, 9090). So you can test access to it with curl http://<pod-ip>:9090 as well.

Sorry if my use of the word 'access' caused confusion here - I just meant network access, not ssh!

yuvipanda commented 2 years ago

NetworkPolicies often operate at the level of pods not services, hence the need to test both.

choldgraf commented 2 years ago

Is this issue also something that we could test as a part of CI/CD? (e.g., via https://github.com/2i2c-org/pilot-hubs/issues/722 + some test suite that were run on a new hub?). It would be really nice if we could tell our user communities that things like "security policies for the hubs" are also tested as part of our deployments, so we could have tests that did thins like "spin up a server, try to do something hub admins generally don't want people to do, confirm that it isn't possible in the staging hub"

sgibson91 commented 2 years ago

I think we should look into this tool for security testing https://github.com/armosec/kubescape

damianavila commented 2 years ago

I think we should look into this tool for security testing https://github.com/armosec/kubescape

💯 , super interesting!!

sgibson91 commented 2 years ago

I looked at which ports the prometheus service exposes with k -n support get svc:
support-prometheus-server                    ClusterIP      10.0.21.196   <none>         80/TCP                       2y40d
so that's port 80, while ssh only tests port 22. So you can test it with curl, which I think should be installed - curl http://<service-ip>:80.

We should also check if we can access the pod directly! By looking at k -n support get pod -l component=server,app=prometheus -o yaml, I can get its IP (.status.podIP) as well as the container port where the server is exposed (in this case, 9090). So you can test access to it with curl http://<pod-ip>:9090 as well.

@yuvipanda yup, both of those are reachable

damianavila commented 2 years ago

mmm... so I guess we would need to go with @consideRatio's idea?

sgibson91 commented 2 years ago

mmm... so I guess we would need to go with @consideRatio's idea?

Yes, I will try this now.

FYI, I came across this cool resource while thinking about how to write the proposed netpol https://github.com/ahmetb/kubernetes-network-policy-recipes

sgibson91 commented 2 years ago

We use the Calico network policy plugin, right? So we can use something like action: Deny and it will be valid for us? It will make the config easier to list IPs we are blocking instead of everything we're allowing except those we don't want.

(I'm quite new to writing network policies for Kubernetes so I'm not sure if I'm approaching this correctly?)

consideRatio commented 2 years ago

We use the Calico network policy plugin, right? So we can use something like action: Deny and it will be valid for us? It will make the config easier to list IPs we are blocking instead of everything we're allowing except those we don't want.

NetworkPolicy as a k8s official resource (enforced by calico), does not support denying resources, only allowing. A pod's egress targetted by a netpol will make no traffic besides whats allowed in a netpol, to be allowed.

If a pod's egress isnt targetted by any netpol, it is unrestricted and all is allowed.

So the act of targetting a pod restricts it to whats allowed explicitly, and you cant subtract/dent permissions, only add.

sgibson91 commented 2 years ago

allow 0.0.0.0/0 (all) except: and list 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 (private ip ranges) and the cloud metadata server also listed in thr z2jh defaults (169.254.169.254)

@consideRatio Yes, so I am struggling to visualise how to write this as a netpol given that it is an "allow all except..." rule

consideRatio commented 2 years ago

Ah, as a single rule, you do it like the default in z2jh. The except block is an array, so you can specify multiple ranges.

https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/main/jupyterhub/values.yaml#L333-L344

sgibson91 commented 2 years ago

Awesome thanks! Do we want this to be an egress rule or an ingress rule? We don't want traffic originating from user pods to reach the prometheus server pod, and we are applying this netpol to the prometheus deployment, which leads me to the conclusion that this will be an ingress rule. (Traffic arriving at prometheus server pod, originating from user server pods)

consideRatio commented 2 years ago

Egress -- for ingress relating to new incoming network onnections, it is very restricted already to only allow new connections to come from the hub and proxy pod pretty. Egress relates to anything the user actually does in the pod, creating new connections to other places etc can be any other place and that's a big category.

With the policy...

allow 0.0.0.0/0 (all) except: and list 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16 (private ip ranges) and the cloud metadata server also listed in thr z2jh defaults (169.254.169.254)

We practically say don't establish connections to anything in local networking, but you can speak with anything besides that.

sgibson91 commented 2 years ago

Ok, I trust you! 😅 I will add that rule in here and retry the tests @yuvipanda recommended

sgibson91 commented 2 years ago

Hmmm, so I added 9b705b1 to draft PR https://github.com/2i2c-org/pilot-hubs/pull/746, then redeployed the support chart to the pilot-hubs cluster (and also redeployed the staging hub just to make sure) then tried the same curl commands Yuvi suggested above. I got the same result which implies to me that we have not achieved our goal 😕

consideRatio commented 2 years ago

@sgibson91 these are some debugging steps I have in mind.

Is there something enforcing it installed in the k8s cluster - is for example calico running in the kube-system namespace?
Is the NetworkPolicy you intended to create created as you intended in the k8s namespace where that user pod was running? (kubectl get netpol -o yaml ...)
What is the 10.0.39.211 IP actually? Is it a k8s Service IP, or is a k8s pod IP? Can you try communicating directly to the Pod IP - is there a difference?
What are the range of IPs used for pods? Are they part of the 10.x.x.x range?
Was the IP from prometheus? If so, is there any network policies configured for prometheus? (note they are namespaced, check in the relevant namespace!)

Separate but related idea that I think we generally should enforce:

We try to make our helm charts deployed define network policies to declare what they should accept traffic from via ingress rules. When it comes to prometheus / grafana, prometheus should only want incomming traffic from grafana right? There can be a netpol to make it so. Perhaps there is already such netpol and one that may require us to label the grafana pod with a specific label.

sgibson91 commented 2 years ago

Is the NetworkPolicy you intended to create created as you intended in the k8s namespace where that user pod was running? (kubectl get netpol -o yaml ...)

@consideRatio I think this is where we are misunderstanding one another.

From your original comment:

I think the right call may be to have tight allowances on what is allowed to access prometheus

I understood this to mean "Let's create a netpol for prometheus that limits what traffic can access it (and capture traffic from the user pods within that)". And so if you inspect #746, you can see that I added the rule we discussed to prometheus's netpol in support/values.yaml. (This is also why I asked whether is should be an egress or ingress rule, because from prometheus's perspective it is incoming traffic, no?)

sgibson91 commented 2 years ago

Is there something enforcing it installed in the k8s cluster - is for example calico running in the kube-system namespace?

Yes, we have calico installed

kubectl get pods -n kube-system
NAME                                                        READY   STATUS    RESTARTS   AGE
calico-node-5mf8x                                           1/1     Running   0          43d
calico-node-gcdx7                                           1/1     Running   0          43d
calico-node-lj9xn                                           1/1     Running   0          43d
calico-node-vertical-autoscaler-56bbb8c4cf-zs9w2            1/1     Running   0          43d
calico-typha-696b89447f-989qp                               1/1     Running   0          43d
calico-typha-696b89447f-wh62v                               1/1     Running   0          43d
calico-typha-horizontal-autoscaler-87dd66849-78s54          1/1     Running   0          43d
calico-typha-vertical-autoscaler-7c9b94496d-mx5hp           1/1     Running   0          43d
[snip]

What is the 10.0.39.211 IP actually? Is it a k8s Service IP, or is a k8s pod IP? Can you try communicating directly to the Pod IP - is there a difference?

From Yuvi's original comment above:

We should also check if we can access the pod directly! By looking at k -n support get pod -l component=server,app=prometheus -o yaml, I can get its IP (.status.podIP) as well as the container port where the server is exposed (in this case, 9090). So you can test access to it with curl http://:9090 as well.

So the first IP I use in my screenshots is the service IP, the second one is the pod IP directly.

What are the range of IPs used for pods? Are they part of the 10.x.x.x range?

From the GCP console:

Pod address range    10.8.0.0/14

Was the IP from prometheus? If so, is there any network policies configured for prometheus? (note they are namespaced, check in the relevant namespace!)

I was updating the netpol for prometheus!

Separate but related idea that I think we generally should enforce:

We try to make our helm charts deployed define network policies to declare what they should accept traffic from via ingress rules. When it comes to prometheus / grafana, prometheus should only want incomming traffic from grafana right? There can be a netpol to make it so. Perhaps there is already such netpol and one that may require us to label the grafana pod with a specific label.

I agree with this

sgibson91 commented 2 years ago

I do not understand why the ingress rule I have defined in the following link is not being applied when I run python deployer deploy-support 2i2c https://github.com/2i2c-org/pilot-hubs/pull/746/files#diff-10b17b1c3606388eeb9ba73c85146ca92acb9acaf62bd4b163abbda33a95dd58R10-R19

When I manually update the support-prometheus-server netpol (via Lens) to include the above rule, we get what we desire, the prometheus server is not available from the user server pods.

However, as soon as I add that config to support/values.yaml and apply it with python deployer deploy-support 2i2c it is not applied as expected.

$ k describe netpol support-prometheus-server
Name:         support-prometheus-server
Namespace:    support
Created on:   2021-06-08 14:04:03 +0100 BST
Labels:       app=prometheus
              app.kubernetes.io/managed-by=Helm
              chart=prometheus-13.8.0
              component=server
              heritage=Helm
              release=support
Annotations:  meta.helm.sh/release-name: support
              meta.helm.sh/release-namespace: support
Spec:
  PodSelector:     app=prometheus,component=server,release=support
  Allowing ingress traffic:
    To Port: 9090/TCP
    From: <any> (traffic not restricted by source)
  Not affecting egress traffic
  Policy Types: Ingress

$ python deployer deploy-support 2i2c
...

$ k describe netpol support-prometheus-server
Name:         support-prometheus-server
Namespace:    support
Created on:   2021-06-08 14:04:03 +0100 BST
Labels:       app=prometheus
              app.kubernetes.io/managed-by=Helm
              chart=prometheus-13.8.0
              component=server
              heritage=Helm
              release=support
Annotations:  meta.helm.sh/release-name: support
              meta.helm.sh/release-namespace: support
Spec:
  PodSelector:     app=prometheus,component=server,release=support
  Allowing ingress traffic:
    To Port: 9090/TCP
    From: <any> (traffic not restricted by source)
  Not affecting egress traffic
  Policy Types: Ingress

I do not understand why this is not being applied from the values file.

sgibson91 commented 2 years ago

I got around this by moving my rule into a new netpol file and disabling prometheus's default netpol.

sgibson91 commented 2 years ago

https://github.com/2i2c-org/pilot-hubs/pull/746 is ready for review

consideRatio commented 2 years ago

@sgibson91 ah regarding ingress/egress questions made and my answer given of egress: I was thinking the focus was now on singleuser egress and not changing ingress rules, but if you modify prometheus rules, the key rule would be ingress I agree.

consideRatio commented 2 years ago

I'm very confused about the work regarding prometheus, it is a Helm chart and I don't know how you configure the Helm chart's netpol rules and haven't unrederstood what configuration of it did or didn't work.

Was the strategy to limit user pod's egress to everything except the local IP ranges (10.x.x.x, 172.16-31.x.x, 192.168.x.x) because it was observed that traffic still was allowed to the prometheus server pod when applying it?

sgibson91 commented 2 years ago

This was my original proposal:

So if we can allow unrestricted access to the internet but not local cluster network we're all good.

If I understand correctly, this is z2jh's default setting. So the implementation of this would look like a PR that removes all the networkPolicy keys from under singleuser in basehub and daskhub, right?

And Yuvi wanted to make sure we couldn't hit the prometheus server from a user pod:

@sgibson91 I think so, but I want to see how it interacts with non Jupyterhub things in cluster. Primarily, I want to verify that we can't hit the Prometheus or grafana servers from user pods, as that can leak information. There is no password auth or anything for Prometheus, so we have to rely on network policy here.

(Just noticing now the grafana server).

PR #746 is my implementation based on how I understood this.

Was the strategy to limit user pod's egress to everything except the local IP ranges (10.x.x.x, 172.16-31.x.x, 192.168.x.x) because it was observed that traffic still was allowed to the prometheus server pod when applying it?

My strategy was to restrict what could access prometheus in prometheus's netpol, rather than create a big list of stuff in the singleuser netpol which we had already decided was undesirable as it makes it difficult to maintain charts that depend on basehub/daskhub (like jmte, for eg)

sgibson91 commented 2 years ago

Very happy for someone to suggest another PR if they have a better implementation!

sgibson91 commented 2 years ago

Let's put a pin in this until early next week and give our brains time to process :) I'll mark my PR as draft again

sgibson91 commented 2 years ago

I took a second stab at this here: https://github.com/2i2c-org/infrastructure/pull/774

I would like to get this resolved before we go live with Pangeo prod :)

consideRatio commented 2 years ago

I think the change made by @sgibson91 in #774, to block all the local network addresses besides the ip address ranges reserved for local networks is a great resolution of this issue. It will be more permissive than what we had before, but certainly break less expectations from end users. Like this, I don't think there is much need for documentation for end users as everything should work unless they try to inspect something within the k8s cluster which they very likely shouldn't do.

sgibson91 commented 2 years ago

On the point of documentation, I think the only point we should add somewhere should let Hub Admins know that we have permissive access to the internet on our hubs and if they'd like a more restrictive netpol, they should get in touch with the engineering team.

damianavila commented 2 years ago

On the point of documentation

Wondering if we need a little section in our eng docs to talk about these details...

choldgraf commented 2 years ago

I agree with this assessment! It seems like the major next steps on this one are to document the changes implemented in #774 for hub admins and engineers. I've updated the top comment with those remaining tasks. @sgibson91 are you OK to make those docs additions as well?

sgibson91 commented 2 years ago

Reopening until I've tackled docs

yuvipanda commented 2 years ago

@sgibson91 also, in https://staging.us-central1-b.gcp.pangeo.io, the python code can't actually access curl https://staging.us-central1-b.gcp.pangeo.io. So curl https://staging.us-central1-b.gcp.pangeo.io fails by hanging forever.

sgibson91 commented 2 years ago

Thanks @yuvipanda - this is the block I need to add, right? https://github.com/yuvipanda/datahub/commit/e8544b0931fa33a5ecc282290fea1281dccce6f1#diff-c7ff155658f0273e3fd2df49dee1551363b934660d0cb6e169938288442eba05R97-R105

consideRatio commented 2 years ago

Hmmm? Python code from within the k8s cluster can't curl https://staging.us-central1-b.gcp.pangeo.io - why? Isn't that a request to the public IP it translates to?

sgibson91 commented 2 years ago

I opened https://github.com/2i2c-org/infrastructure/pull/782

sgibson91 commented 2 years ago

Documentation PRs:

For Hub Admins: https://github.com/2i2c-org/pilot/pull/109
For Hub Engineers: https://github.com/2i2c-org/infrastructure/pull/783

I wasn't really sure what to write for the engineering docs 🤷🏻‍♀️

yuvipanda commented 2 years ago

@sgibson91 yep, that block adds support for talking to the hub public IP when using ingress. We'd probably also need a another block for talking to the public IP when it's using autohttps rather than nginx-ingress tho.

@consideRatio I think the packets somehow get routed internally and don't go out into the internet and back? Not sure :)

consideRatio commented 2 years ago

Perhaps the logic is along the lines that...

the domain is translated to a public ip
there is some local knowledge, that traffic to the public ip from within, can just as well go directly to the local IP directly instead
traffic would get blocked by accessing the local ip

2i2c-org / infrastructure