ESIPFed / esiphub-dev

Development JupyterHub on AWS targeting pangeo environment for National Water Model exploration
MIT License
2 stars 1 forks source link

Implement autoscaling on AWS #11

Closed rsignell-usgs closed 5 years ago

rsignell-usgs commented 6 years ago

Right now our Kubernetes nodes are always running, regardless if anyone is using them.

We really need to implement the kubernetes Cluster Autoscaler on AWS

I think we should try to implement this ASAP.

@jreadey, does this make sense to you?
Or do we need to try to get help?

rsignell-usgs commented 6 years ago

@jreadey , there is a Helm chart for autoscaling k8s on AWS: https://github.com/kubernetes/charts/tree/master/stable/cluster-autoscaler

jreadey commented 6 years ago

How has the scheduled scaling been working for you?

rsignell-usgs commented 6 years ago

It seems quite wasteful. Are there reasons not to implement cluster autoscaling?

rsignell-usgs commented 6 years ago

@jreadey , have you tried following the instructions here: https://akomljen.com/kubernetes-cluster-autoscaling-on-aws/

@jacobtomlinson, do those look reasonable?

I tried, and didn't even get out of the starting gate:

(IOOS3) rsignell@gamone:~> kops edit ig nodes
Using cluster from kubectl context: kubernetes

State Store: Required value: Please set the --state flag or export KOPS_STATE_STORE.
A valid value follows the format s3://<bucket>.
A s3 bucket is required to store cluster state information.

@jreadey, do you know the S3 location of the kops state store?

jacobtomlinson commented 6 years ago

Yup looks totally reasonable to me!

rsignell-usgs commented 5 years ago

@jreadey , nevermind. I found it. https://s3-us-west-2.amazonaws.com/hdflab-kubernetes-k8sstack-sflx-clusterinfobucket-11famro67p10g/cluster-info.yaml

rsignell-usgs commented 5 years ago

@jreadey , hmm, maybe not:

rsignell@gamone:~> export KOPS_STATE_STORE=s3://hdflab-kubernetes-k8sstack-sflx-clusterinfobucket-1
1famro67p10g

rsignell@gamone:~> kops get clusters

error reading state store: Could not retrieve location for AWS bucket hdflab-kubernetes-k8sstack-sflx-clusterinfobucket-11famro67p10g
rsignell-usgs commented 5 years ago

Now that the kops cluster is working, I followed these instructions: https://akomljen.com/kubernetes-cluster-autoscaling-on-aws/ with these settings:

helm install --name autoscaler \
    --namespace kube-system \
    --set image.tag=v1.2.1 \
    --set autoDiscovery.clusterName=kopscluster.k8s.local \
    --set extraArgs.balance-similar-node-groups=false \
    --set extraArgs.expander=random \
    --set rbac.create=true \
    --set rbac.pspEnabled=true \
    --set awsRegion=us-east-1 \
    --set nodeSelector."node-role\.kubernetes\.io/master"="" \
    --set tolerations[0].effect=NoSchedule \
    --set tolerations[0].key=node-role.kubernetes.io/master \
    --set cloudProvider=aws \
    stable/cluster-autoscaler

Seems like it's running anyway:

$ kubectl get pods -l "app=aws-cluster-autoscaler" -n kube-system
NAME                                                 READY     STATUS    RESTARTS   AGE
autoscaler-aws-cluster-autoscaler-7c494d8cbb-nvp58   1/1       Running   0          3m
rsignell-usgs commented 5 years ago

Autoscaling is working!!!