dask / helm-chart

Helm charts for Dask
https://helm.dask.org/
91 stars 91 forks source link

Make service type ClusterIP and port forwarding the default #53

Closed JulianWgs closed 4 years ago

JulianWgs commented 4 years ago

Hello all,

why is the default service type not ClusterIP? This has the advantage over LoadBalancer to not expose the dask cluster to the open internet (with a default passwort).

Then you can connect to the Jupyter Lab and the dashboard by port forwarding: kubectl port-forward --namespace default svc/dask-jupyter 8011:80 kubectl port-forward --namespace default svc/dask-scheduler 8012:80

If this is a good idea, I would create a pull request :)

Greetings

kazimuth commented 4 years ago

I was also going to open an issue about this. Having a Dask cluster exposed to the open internet seems like a good way to unknowingly mine a lot of bitcoin.

mrocklin commented 4 years ago

How do you think we fund Dask workshops? ;)

jacobtomlinson commented 4 years ago

I'm in two minds about this.

Defaulting to a load balancer feels easier to users who are brand new to kubernetes. They install a thing and it becomes available at a url.

However as you say if someone was able to find out the IP of the cluster they could connect to it also for malicious reasons. A ClusterIP would be more secure but as you say it requires some extra port forwards which the user would need to run whenever they want to access the cluster.

I'm not sure which is the right approach, I'm open to convincing.

JulianWgs commented 4 years ago

I think especially new users to kubernetes should not unknowingly expose a cluster to the internet. Usage is just a matter of copy pasting from the NOTES.txt, when you install via helm.

@mrocklin Is this how you fund the dask company, too? Don't want to stand in your way tbh :P

So if there are no other objections I would create a PR :)

jacobtomlinson commented 4 years ago

Happy to review a PR! However I have the following comments:

I think especially new users to kubernetes should not unknowingly expose a cluster to the internet.

There are things we can do to make sure it is knowingly. Such as putting an obvious warning in the NOTES.txt.

Usage is just a matter of copy pasting from the NOTES.txt, when you install via helm.

The notes are only displayed when you install or update the chart. The user will need to run the port forward commands every time they want to access their deployment and will likely forget how to do so. We need to think of a way to make this easy for them.

kazimuth commented 4 years ago

However as you say if someone was able to find out the IP of the cluster they could connect to it also for malicious reasons.

Note that this happens very quickly. Tools such as zmap and shodan can be used to port-scan the entire ipv4 internet in a day. Hackers (and companies, and state entities...) around the world are constantly searching for new servers / devices to exploit.

This means that anyone running a helm-dask cluster is likely to be discovered, and potentially exploited, within days.

Note that some of the ripest targets for hackers are insecure-by-default open source software configurations. Most users don't modify default security settings, so if you know how to get into an unmodified deployment of a piece of software, you can get into most deployments of that software.

For example, in 2017, more than 28,000 installs of MongoDB were hacked -- because the default configuration left ports open to the internet. These ports were easy to find, and the installs were trivial to exploit. Hackers encrypted the data and demanded massive ransoms from companies for the decryption keys.

Helm-dask installs represent similarly ripe targets, since they can be easily monetized via Bitcoin/Monero mining.

The notes are only displayed when you install or update the chart. The user will need to run the port forward commands every time they want to access their deployment and will likely forget how to do so. We need to think of a way to make this easy for them.

This is a fair concern. However, I believe compute cluster administrators will find running a single console command less inconvenient than discovering that their Dask cluster has been hacked to mine cryptocurrency.

jacobtomlinson commented 4 years ago

Consider me convinced.

I am still concerned that this will make things harder for new users, but the security implications outweigh the increase in complexity.

kyprifog commented 4 years ago

I want to second this. With LoadBalancer as the default, this makes anyone that unwittingly runs the helm chart expose their kubernetes cluster to the internet, and anyone could schedule jobs to your dask scheduler.

As an alternative, you could consider linking documentation on how to properly secure LoadBalancers on kubernetes for various cloud providers, and put a big warning that this helm chart spins up insecure assets? I wonder if there is anyway to enforce it?...

jsanjay63 commented 4 years ago

I tried deploying on AWS using helm charts using this command: helm install dask dask/dask --set scheduler.serviceType=LoadBalancer --set jupyter.serviceType=LoadBalancer

I get the loadbalancer endpoints using helm get svc and when I try to access those endpoints for jupyterhub, I am unable to connect. Not sure if it's related to this issue. Can someone please help?

jacobtomlinson commented 4 years ago

Not sure if it's related to this issue. Can someone please help?

Please raise a new issue, this issue is not related.