det-lab / jupyterhub-deploy-kubernetes-jetstream

CDMS JupyterHub deployment on XSEDE Jetstream
0 stars 1 forks source link

allow users to spin up large or extra-large instances? #30

Closed pibion closed 4 years ago

pibion commented 4 years ago

Hi @zonca , we have some users that are likely going to need a lot of RAM for some upcoming analysis (@ziqinghong). The goal is to eventually eliminate the need for > 10 GB but for now there are times when it would be helpful to have a whole lot of RAM available.

I wonder if it's possible to make it an option for users to request a large or extra-large instance. If this is difficult please don't worry about it. I thought I'd ask mainly to discuss the possibility.

zonca commented 4 years ago

I looked through the docs of KubeSpawner, they have native support for this!

See profile_list in https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html

They could even have different images.

See this example from the docs:

c.KubeSpawner.profile_list = [
    {
        'display_name': 'Training Env - Python',
        'slug': 'training-python',
        'default': True,
        'kubespawner_override': {
            'image': 'training/python:label',
            'cpu_limit': 1,
            'mem_limit': '512M',
        }
    }, {
        'display_name': 'Training Env - Datascience',
        'slug': 'training-datascience',
        'kubespawner_override': {
            'image': 'training/datascience:label',
            'cpu_limit': 4,
            'mem_limit': '8G',
        }
    }, {
        'display_name': 'DataScience - Small instance',
        'slug': 'datascience-small',
        'kubespawner_override': {
            'image': 'datascience/small:label',
            'cpu_limit': 10,
            'mem_limit': '16G',
        }
    }, {
        'display_name': 'DataScience - Medium instance',
        'slug': 'datascience-medium',
        'kubespawner_override': {
            'image': 'datascience/medium:label',
            'cpu_limit': 48,
            'mem_limit': '96G',
        }
    }, {
        'display_name': 'DataScience - Medium instance (GPUx2)',
        'slug': 'datascience-gpu2x',
        'kubespawner_override': {
            'image': 'datascience/medium:label',
            'cpu_limit': 48,
            'mem_limit': '96G',
            'extra_resource_guarantees': {"nvidia.com/gpu": "2"},
        }
    }
]

the users should get a dropdown menu and they can choose which one they want.

if you decide on a name, CPU and memory requirements for 3 or 4 profiles, we can test it.

pibion commented 4 years ago

Amzaing! @ziqinghong, could you comment on how much RAM you're likely to need for your two-month analysis of the TUNL data? I'm thinking we can set up some profiles with increasing numbers of CPU and that much RAM so you can play around with using Dask to speed up slow computations.

ziqinghong commented 4 years ago

For interactive computing, usually the code is single threaded. At most at times there are implicit multithread happening. The most amount of work is to look at an 2D array or equivalent dictionary, apply a bunch of selection criteria, and make a bunch of histograms. So 1-2 CPU core would be plenty I would guess. 16-32 GB of RAM would be nice. Though I might not understand how numpy does its under-the-hood threading and more cores are actually faster..

pibion commented 4 years ago

Thanks @ziqinghong! @zonca will have a Dask server up and running at some point, and since you're using numpy arrays it may be fairly easy to use Dask to parallelize your code without having to change much. So if there end up being things that are obnoxiously slow that you would normally want to move to a cluster, Dask may be another option. If that sounds useful then we could set up some instances with more CPUs because Dask will be able to take advantage of those. But if you don't see an immediate need then we can wait until we have a use-case.

@zonca maybe a good setup to start with would be an instance with 2 CPUs and 48 GB of RAM? I'd suggest we name this something like "2 CPUs and 48 GB of RAM" if spaces are allowed.

Do we need to caution users to shut down their instance when they're done? This will eat up more of the allocation than the instances we have running now.

zonca commented 4 years ago

ok, implemented this.

Now when users start their session they can choose:

image

You can create profiles or change the names here:

https://github.com/det-lab/jupyterhub-deploy-kubernetes-jetstream/blob/42af7fa9ad805d0c643885c81768f3f8924ebdce/config_standard_storage.yaml#L30-L44

Please create a Pull Request, do not directly edit it.

I can then check the configuration and then deploy it.

zonca commented 4 years ago

@julienchastang you might be interested in this feature as well, if you don't already know about it.

zonca commented 4 years ago

Thanks @ziqinghong! @zonca will have a Dask server up and running at some point, and since you're using numpy arrays it may be fairly easy to use Dask to parallelize your code without having to change much. So if there end up being things that are obnoxiously slow that you would normally want to move to a cluster, Dask may be another option. If that sounds useful then we could set up some instances with more CPUs because Dask will be able to take advantage of those. But if you don't see an immediate need then we can wait until we have a use-case.

No, dask will launch other pods and distribute work on those pods, so it is independent of the size of the Jupyterlab pods.

@zonca maybe a good setup to start with would be an instance with 2 CPUs and 48 GB of RAM? I'd suggest we name this something like "2 CPUs and 48 GB of RAM" if spaces are allowed.

it is good practice to scale cpus and RAM the same way, it's like you are getting the same percentage of RAM and CPU. This way we can better use all the resources of a node. Consider that a lot of numpy functions nowadays automatically use all the CPUs, so it is nice to have many.

Do we need to caution users to shut down their instance when they're done? This will eat up more of the allocation than the instances we have running now.

No, usage is just due to the number of Jetstream instances we use. Currently just 1 master node and 1 worker node. So 1 person at a time can use the "full node" instance.

pibion commented 4 years ago

@zonca thanks for the clarifications!

zonca commented 4 years ago

@ziqinghong @pibion do you have any feedback on this? or is it ok for now?

pibion commented 4 years ago

@zonca so far so good. I've only used the tiny instance, but the options are showing up clearly at login.

@ziqinghong what data do you need to take advantage of the large node?

ziqinghong commented 4 years ago

Just tried the super big one. Worked. Thank you!

@pibion Can we talk in a different thread about how can I register ~200 files in data catalog?