Closed zonca closed 4 years ago
testing with XL instances, I'll test first with 2GB/user and check I can accommodate 10 or 20 users/node. Then I'll switch to the planned 10GB/user and verify I can accommodate 5 users/node
I used hubtraf
to simulate users, see https://zonca.github.io/2019/10/loadtest-jupyterhub.html
m1.xlarge
instances with 60GB RAM, if I request 2 GB/user, I tested I can accommodate 60 users (I should be able to accommodate ~80, as we are also scheduling on the master node) with no storage (we have a low limit, asked to increase)singleuser:
memory:
guarantee: 10G
limit: 10G
cpu:
guarantee: 4
limit: 10
storage:
type: none
The VM has 24 CPUs but Kubernetes makes only 20 available to the containers, so we guarantee only 4 CPUs/user, with a limit of 10 or we could even push to 20 so that if a user happens to be on a empty node, they can use all of it.
So it seems like to accommodate 60 users with 10 GB of RAM, we'd need six XL nodes (1 master + 5 nodes). Do you have a sense of whether this will eat through my allocation unusually quickly? If so I could imagine starting by limiting the number of users to 20 or 30. Limiting RAM usage would probably be more difficult.
Another question is - can a Jupyter instance make use of multiple cores? Or would those potentially be useful for something like Dask?
I made a Notebook to calculate usage, the result for this case is:
--------------- SU usage for the minimum scenario
1 m1.xlarge master - 0 m1.xlarge workers
24 SU/hour
576 SU/day
17,280 SU/month
--------------- SU usage for the average scenario
1 m1.xlarge master - 2 m1.xlarge workers
72 SU/hour
1,728 SU/day
51,840 SU/month
--------------- SU usage for the maximum scenario
1 m1.xlarge master - 5 m1.xlarge workers
144 SU/hour
3,456 SU/day
103,680 SU/month
See https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream/blob/master/Jetstream_cost_calculator.ipynb to play with it.
If usage is very spotty, we might deploy master in a medium
instance, so that when unused we are wasting little resources, and it still can run 1 user for testing purposes, then we have the autoscaler create XL instances when there is load.
For the case of a m1.medium
master (in this case we need 1 more worker to support the same number of users):
--------------- SU usage for the minimum scenario
1 m1.medium master - 0 m1.xlarge workers
6 SU/hour
144 SU/day
4,320 SU/month
--------------- SU usage for the average scenario
1 m1.medium master - 3 m1.xlarge workers
78 SU/hour
1,872 SU/day
56,160 SU/month
--------------- SU usage for the maximum scenario
1 m1.medium master - 6 m1.xlarge workers
150 SU/hour
3,600 SU/day
108,000 SU/month
About multicore: sure, actually numpy
automatically uses multiple cores for a lot of operations now, otherwise people can use numba.jit(parallel=True)
or use dask
.
The m1.medium master seems perfect to me. I'm expecting usage to be spotty, punctuated by days with significantly more users due to workshops and the occasional down time of our primary server. I can set a student on monitoring usage so we know if the patterns are different.
Thanks for the information about multiple cores - having this will probably be very helpful for us, we use numpy heavily.
ok, this test is completed, workaround is to increase the timeout. I'll follow up on https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream/issues/23 about this.
But for CDMS we can go ahead with the plan. I deleted the cluster, I'll start again testing next week or after the holidays with the medium instance for master and deploy the custom CDMS environment.
@pibion I have created a cluster with 1 master and 2 worker nodes named
k8s_cdms
. First I want to test how many users I can accommodate per node. I am worried about this given reports at https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream/issues/23.Once this is solved I'll continue with the planned steps.