Test number of users per Jetstream VM

zonca commented 4 years ago

@pibion I have created a cluster with 1 master and 2 worker nodes named k8s_cdms. First I want to test how many users I can accommodate per node. I am worried about this given reports at https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream/issues/23.

Once this is solved I'll continue with the planned steps.

zonca commented 4 years ago

testing with XL instances, I'll test first with 2GB/user and check I can accommodate 10 or 20 users/node. Then I'll switch to the planned 10GB/user and verify I can accommodate 5 users/node

zonca commented 4 years ago

I used hubtraf to simulate users, see https://zonca.github.io/2019/10/loadtest-jupyterhub.html

1 master + 2 nodes, all m1.xlarge instances with 60GB RAM, if I request 2 GB/user, I tested I can accommodate 60 users (I should be able to accommodate ~80, as we are also scheduling on the master node) with no storage (we have a low limit, asked to increase)
same cluster but I request 10 GB/node, I can accommodate 5 users/node, 15 total. Configuration:
```
singleuser:
memory:
guarantee: 10G
limit: 10G
cpu:
guarantee: 4
limit: 10
storage:
type: none
```
The VM has 24 CPUs but Kubernetes makes only 20 available to the containers, so we guarantee only 4 CPUs/user, with a limit of 10 or we could even push to 20 so that if a user happens to be on a empty node, they can use all of it.

pibion commented 4 years ago

So it seems like to accommodate 60 users with 10 GB of RAM, we'd need six XL nodes (1 master + 5 nodes). Do you have a sense of whether this will eat through my allocation unusually quickly? If so I could imagine starting by limiting the number of users to 20 or 30. Limiting RAM usage would probably be more difficult.

Another question is - can a Jupyter instance make use of multiple cores? Or would those potentially be useful for something like Dask?

zonca commented 4 years ago

I made a Notebook to calculate usage, the result for this case is:

--------------- SU usage for the minimum scenario
1 m1.xlarge master - 0 m1.xlarge workers
24 SU/hour
576 SU/day
17,280 SU/month
--------------- SU usage for the average scenario
1 m1.xlarge master - 2 m1.xlarge workers
72 SU/hour
1,728 SU/day
51,840 SU/month
--------------- SU usage for the maximum scenario
1 m1.xlarge master - 5 m1.xlarge workers
144 SU/hour
3,456 SU/day
103,680 SU/month

See https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream/blob/master/Jetstream_cost_calculator.ipynb to play with it. If usage is very spotty, we might deploy master in a medium instance, so that when unused we are wasting little resources, and it still can run 1 user for testing purposes, then we have the autoscaler create XL instances when there is load.

For the case of a m1.medium master (in this case we need 1 more worker to support the same number of users):

--------------- SU usage for the minimum scenario
1 m1.medium master - 0 m1.xlarge workers
6 SU/hour
144 SU/day
4,320 SU/month
--------------- SU usage for the average scenario
1 m1.medium master - 3 m1.xlarge workers
78 SU/hour
1,872 SU/day
56,160 SU/month
--------------- SU usage for the maximum scenario
1 m1.medium master - 6 m1.xlarge workers
150 SU/hour
3,600 SU/day
108,000 SU/month

zonca commented 4 years ago

About multicore: sure, actually numpy automatically uses multiple cores for a lot of operations now, otherwise people can use numba.jit(parallel=True) or use dask.

pibion commented 4 years ago

The m1.medium master seems perfect to me. I'm expecting usage to be spotty, punctuated by days with significantly more users due to workshops and the occasional down time of our primary server. I can set a student on monitoring usage so we know if the patterns are different.

Thanks for the information about multiple cores - having this will probably be very helpful for us, we use numpy heavily.

zonca commented 4 years ago

ok, this test is completed, workaround is to increase the timeout. I'll follow up on https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream/issues/23 about this.

But for CDMS we can go ahead with the plan. I deleted the cluster, I'll start again testing next week or after the holidays with the medium instance for master and deploy the custom CDMS environment.

det-lab / jupyterhub-deploy-kubernetes-jetstream

Test number of users per Jetstream VM #2