Closed rsignell-usgs closed 6 years ago
Yesterday I took a one day AWS class on kubernetes (and they suggested kops over EKS for now). They used this guide: https://github.com/aws-samples/aws-workshop-for-kubernetes and following the initial steps resulted in creating this IAM role: which seems to meet the requirements of the first step the Zero to JupyterHub guide
Following the rest of the z2jh guide step zero I assigned the above role to the small (t2.small) instance we are using for the CI host, which we can access via:
ssh -i "kops.pem" ec2-user@ec2-18-208-141-112.compute-1.amazonaws.com
I created a cluster with these specs based on some advice from Jacob:
kops create cluster kopscluster.k8s.local \
--zones us-east-1a,us-east-1b,us-east-1c,us-east-1d,us-east-1e,us-east-1f \
--authorization RBAC \
--master-size t2.small \
--master-volume-size 10 \
--node-size m5.2xlarge \
--master-count 3 \
--node-count 2 \
--node-volume-size 120 \
--yes
I then followed the z2jh guide up through "Setting up Helm"
Wow, I finally overcame the struggles by destroying and recreating the cluster using m4
instances instead of m5
. What a nightmare!
So this is what finally worked:
kops create cluster kopscluster.k8s.local \
--zones us-east-1a,us-east-1b,us-east-1c,us-east-1d,us-east-1e,us-east-1f \
--authorization RBAC \
--master-size t2.small \
--master-volume-size 10 \
--node-size m4.2xlarge \
--master-count 3 \
--node-count 2 \
--node-volume-size 120 \
--yes
According to @jacobtomlinson here: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/870#issuecomment-416883355 it appears we could run m5
instances by just adding:
--image kope.io/k8s-1.8-debian-stretch-amd64-hvm-ebs-2018-02-08
to the above kops create cluster
command.
The pangeo helm chart layers on the JupyterHub helm chart, so the instructions for Pangeo and JupyterHub are the same up to the helm install step. So the recipe is:
Follow the zerotojupyterhub guide (https://zero-to-jupyterhub-with-kubernetes.readthedocs.io/en/latest/) for deployment on AWS, until you get to the "Setting up JuptyerHub" page (https://zero-to-jupyterhub-with-kubernetes.readthedocs.io/en/latest/setup-jupyterhub.html)
We used kops to create the cluster:
kops create cluster kopscluster.k8s.local \
--zones us-east-1a,us-east-1b,us-east-1c,us-east-1d,us-east-1e,us-east-1f \
--authorization RBAC \
--master-size t2.small \
--master-volume-size 10 \
--node-size m4.2xlarge \
--master-count 3 \
--node-count 2 \
--node-volume-size 120 \
--yes
and enabled autoscaling following I followed these instructions: https://akomljen.com/kubernetes-cluster-autoscaling-on-aws/ with these settings:
helm install --name autoscaler \
--namespace kube-system \
--set image.tag=v1.2.1 \
--set autoDiscovery.clusterName=kopscluster.k8s.local \
--set extraArgs.balance-similar-node-groups=false \
--set extraArgs.expander=random \
--set rbac.create=true \
--set rbac.pspEnabled=true \
--set awsRegion=us-east-1 \
--set nodeSelector."node-role\.kubernetes\.io/master"="" \
--set tolerations[0].effect=NoSchedule \
--set tolerations[0].key=node-role.kubernetes.io/master \
--set cloudProvider=aws \
stable/cluster-autoscaler
I set up autoscaling groups in each zone (us-east-1a
, us-east-1b
, ... us-east-1f
) by running commands like:
kops edit ig nodes-us-east-1a-m4-2xlarge.kopscluster.k8s.local
and dropping in info like this:
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-09-10T13:27:48Z
labels:
kops.k8s.io/cluster: kopscluster.k8s.local
name: nodes-us-east-1a-m4-2xlarge.kopscluster.k8s.local
spec:
cloudLabels:
k8s.io/cluster-autoscaler/enabled: ""
k8s.io/cluster-autoscaler/node-template/label: ""
kubernetes.io/cluster/kopscluster.k8s.local: owned
image: kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
machineType: m4.2xlarge
maxSize: 50
minSize: 0
nodeLabels:
kops.k8s.io/instancegroup: nodes-us-east-1a-m4-2xlarge.kopscluster.k8s.local
role: Node
rootVolumeSize: 120
subnets:
- us-east-1a
After this it's just configuring and install the Pangeo Helm Chart, starting with step 4 of the Pangeo instructions here: https://github.com/pangeo-data/pangeo/blob/master/docs/setup_guides/cloud.rst
I also installed the helm chart for the awesome s3-fuse-flex-volume pysssix
and goofys
flexVolumes from the Informatics Lab, giving the ability to read any s3 bucket as /s3/<bucket>
, and write to an s3 bucket on /scratch
.
My worker-template.yaml
looks like this:
metadata:
spec:
restartPolicy: Never
volumes:
- flexVolume:
driver: informaticslab/pysssix-flex-volume
options:
readonly: "true"
name: s3
- flexVolume:
driver: informaticslab/goofys-flex-volume
options:
bucket: "esipfed-scratch"
dirMode: "0777"
fileMode: "0777"
name: scratch
containers:
- args:
- dask-worker
- --nthreads
- '2'
- --no-bokeh
- --memory-limit
- 6GB
- --death-timeout
- '60'
image: esip/pangeo-notebook:2018-09-21
name: dask-worker
securityContext:
capabilities:
add: [SYS_ADMIN]
privileged: true
volumeMounts:
- mountPath: /s3
name: s3
- mountPath: /scratch
name: scratch
resources:
limits:
cpu: "1.75"
memory: 6G
requests:
cpu: "1.75"
memory: 6G
@rsignell-usgs
and dropping in info like this:
Can you be a bit more precise on this? How do I incorporate the information? I am pretty new to the whole kubernetes topic so it is not (yet) obvious to me. Also I guess this step has to be completed before setting up the groups with kops edit ig ...
correct?
otherwise I get the following error:
error reading InstanceGroup "nodes-eu-central-1a-m4-2xlarge. hive.k8s.local": InstanceGroup.kops "nodes-eu-central-1a-m4-2xlarge. hive.k8s.local" not found
here is what I tried. Create a new file culster_settings.yaml
with:
...
apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
creationTimestamp: 2018-09-10T13:27:48Z
labels:
kops.k8s.io/cluster: hive.k8s.local
name: nodes-eu-central-1c-m4-2xlarge.hive.k8s.local
spec:
cloudLabels:
k8s.io/cluster-autoscaler/enabled: ""
k8s.io/cluster-autoscaler/node-template/label: ""
kubernetes.io/cluster/hive.k8s.local: owned
image: kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
machineType: m4.2xlarge
maxSize: 50
minSize: 0
nodeLabels:
kops.k8s.io/instancegroup: nodes-eu-central-1c-m4-2xlarge.hive.k8s.local
role: Node
rootVolumeSize: 120
subnets:
- eu-central-1c
and then running:
helm upgrade autoscaler \
--namespace kube-system \
--set image.tag=v1.2.1 \
--set autoDiscovery.clusterName=hive.k8s.local \
--set extraArgs.balance-similar-node-groups=false \
--set extraArgs.expander=random \
--set rbac.create=true \
--set rbac.pspEnabled=true \
--set awsRegion=eu-central-1 \
--set nodeSelector."node-role\.kubernetes\.io/master"="" \
--set tolerations[0].effect=NoSchedule \
--set tolerations[0].key=node-role.kubernetes.io/master \
--set cloudProvider=aws \
stable/cluster-autoscaler \
-f cluster_settings.yaml
Thank you!
@h4gen, I meant that I typed kops edit ig
and then pasted that info in.
One more piece: setting up "pangeo.esipfed.org" to be our endpoint.
Client Id
and Client Secret
:
Added those to our secret-config.yaml
:
On networksolutions.com
(like godaddy), we set pangeo.esipfed.org to point to the Amazon URL:
Thanks @rsignell-usgs! Was there any special with setting up load balancer on AWS and assigning it to kops cluster?
@aolt, you installed the autoscaler helm chart, right? We don't have any extra load balancer that I know of.
@rsignell-usgs yes I did, everything installed nicely and is in running state, but the external-ip.
In jupyter-config.yaml one has to specify external IP with loadBalancerIP:
. My kops cluster is installed in private VPC, so therefore I assume I have to assign some external IP with AWS, which I can used in jupyter-config.yaml file.
kubectl describe service proxy-public --namespace pangeo
Name: proxy-public
Namespace: pangeo
Labels: app=jupyterhub
chart=jupyterhub-0.7.0
component=proxy-public
heritage=Tiller
release=jupyter
Annotations: <none>
Selector: component=proxy,release=jupyter
Type: LoadBalancer
IP: 100.67.54.89
IP: 35.175.192.236
Port: http 80/TCP
TargetPort: 8000/TCP
NodePort: http 32646/TCP
Endpoints: 100.96.2.5:8000
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 2m20s (x11 over 27m) service-controller Ensuring load balancer
Warning CreatingLoadBalancerFailed 2m20s (x11 over 27m) service-controller Error creating load balancer (will retry): failed to ensure load balancer for service pangeo/proxy-public: LoadBalancerIP cannot be specified for AWS ELB
- On
networksolutions.com
(like godaddy), we set pangeo.esipfed.org to point to the Amazon URL:
I rephrase my question, where do you get this long *.elb.amazonws.com address ? Thanks!
@aolt , I assume you figured this out, but when you do a helm install or upgrade, it prints out a statement like this:
You can find the public IP of the JupyterHub by doing:
kubectl --namespace=esip-pangeo get svc proxy-public
It might take a few minutes for it to appear!
thanks @rsignell-usgs, the issue was I was missing the ingress helm chart, https://github.com/pangeo-data/pangeo/issues/71#issuecomment-435834926
We would like to have a cluster built using kops so that we have a better chance of using autoscaling, and because the AWS landsat data is on US-EAST.