Open kimoonkim opened 7 years ago
@foxish @mccheah @ash211 So opening firewall for the node port range for GKE VMs like what I did means anyone on the Internet can access those ports.
$ gcloud compute firewall-rules create k8snodeport --allow tcp:30000-32767
This feels unsecured. For instance, hackers can upload rogue jars to Spark jobs. It could make people uncomfortable to use Spark-on-K8s on GKE. I understand the rest server in the driver pod support SSL. But I feel like that's extra work that most new people won't know how to do.
Should we think about more secure default method? e.g. Using kubectl
is secure by default since it requires the cluster credential. Maybe we can piggyback on the same thing? Does submission v2 somehow address this already?
probably gcloud
has a way to expose the submission port over ingress instead of using nodeport?
It is possible to make that firewall rule more restrictive by specifying the source-range for traffic, rather than opening up the port range to the entire internet. The firewall rules are GCP project scoped and can be used to restrict traffic to ingresses as well as nodeports.
An alternative is using the APIServer proxy, which has the advantage of simplicity. It is probably something we can consider in the submission code to allow submitting via the proxy. While it may not be the most appropriate for submitting large files and such, or production use, it would offer a simpler kick-the-tires experience for experimentation without the need to create firewall rules.
It is possible to make that firewall rule more restrictive by specifying the source-range for traffic, rather than opening up the port range to the entire internet.
Yes, I was aware of that. But the source range for me was my Macbook inside our corporate network, which I believe is not exposed to Internet in terms of public IPs. I wasn't able to figure out what this means to the "source range". Maybe it's just me not knowing the right thing to do. @ssuchter might know more about our environment.
An alternative is using the APIServer proxy, which has the advantage of simplicity. It is probably something we can consider in the submission code to allow submitting via the proxy.
I like this idea. What about kubectl cp
that we discussed before in
https://github.com/apache-spark-on-k8s/spark/issues/167#issuecomment-294001633? If we use it to upload jars to the v2 staging server, would it be more secure?
probably gcloud has a way to expose the submission port over ingress instead of using nodeport?
I couldn't find ingress. But GKE support egress or LoadBalancer
, which adjusts the firewall automatically. We could support that. But this will add to the cloud bill and so might not be preferred by people.
It is possible to make that firewall rule more restrictive by specifying the source-range for traffic, rather than opening up the port range to the entire internet.
Yes, I was aware of that. But the source range for me was my Macbook inside our corporate network, which I believe is not exposed to Internet in terms of public IPs. I wasn't able to figure out what this means to the "source range". Maybe it's just me not knowing the right thing to do. @ssuchter might know more about our environment.
I encountered the same issue with EC2. But the EC2 Web UI automatically detected the "source range" IP that matches my Macbook. Maybe Google Cloud has similar UI as well. So I think we can mention this in the documentation.
@foxish
I was running our Spark-on-K8s code against a Google Container Engine (GKE) cluster. In our code, the driver pod opens up a
NodePort
so that the client can submit requests to the port. In GKE, the firewall is set up to block any access to cluster nodes. So the client won't be able to access driverNodePort
. From https://kubernetes.io/docs/tasks/access-application-cluster/configure-cloud-provider-firewall/I ended up manually opening all
NodePort
range. (SinceNodePort
is randomly assigned per job, it is not possible/practical to just open up a single port):$ gcloud compute firewall-rules create k8snodeport --allow tcp:30000-32767
This allowed me to submit Spark jobs from my Macbook.
However, Google has Cloud Shell that one could use for job submission. I wasn't able to get the Cloud Shell to work even after opening up the
NodePort
range. Maybe there is a separate firewall on the Cloud Shell side as well.Maybe we should at least document these caveats so that people would know.