dandi / dandi-hub

Infrastructure and code for the dandihub
https://hub.dandiarchive.org
Other
11 stars 23 forks source link

Explore options for providing users with on-demand resources #170

Open kabilar opened 4 months ago

kabilar commented 4 months ago

Description

We utilize AWS spot instances to minimize costs while providing the JupyterHub as a service to all users. Users may end up being kicked off when the server is no longer available, resulting in a poor user experience. Perhaps we can explore ways to provide select users (e.g. during training workshops) with on-demand instances.

Next steps

  1. We should define and document under what circumstances that on-demand instances are made available to users.
  2. Determine a strategy for implementing access to on-demand vs spot instances.

cc @bendichter @asmacdo

ovalerio commented 4 months ago

[asmacdo edit] Moved to new issue:

Click to expand H ello @kabilar, Comments on the issue: "JupyterHub intermittently *freezing* and asking to be *reboot*" Thanks for following up on the JupyterHub issues. As you might know, last week, during the DANDI ReHack at Janelia we were using both the `dandi-hub` and the `hub-staging` for running JupyterLab Notebooks. In my case I was using custom conda environments. The issue could not easily be pinpointed to inactivity, since the disconnection sometimes occurred while the Notebooks Jupyter Kernels were busy doing computation. There was an error message that suggest to restart the notebook. That was the recommendation and if we follow that recommendation then the currently installed packages on the conda environment were lost. Sometimes if I opt to dismiss the message the session could eventually recover, but with a degraded performance. To answer your questions: _Did they see any error messages?_ Yes, there was a pop-up message in JupyterHub with the option to restart the session and/or dismiss. _Were they actively using the server when kicked off? We have a culler for idle machines, I think its set to 60 minutes._ Sometimes it happen when actively in use, sometimes I was also making use of the instance using the JupyterHub Extension in VSCode. I post a separate *feature request* on that subject last week, but I think there I forgot to mention that I was already using the JupyterHub Extension to connect to my allocated instance, but using notebooks that I was creating locally. _What time were they kicked, I can check to see if theres anything in the logs_ After I noticed that every time the session restarted I was losing my python environment I opt to create my environment using a persistent location (/home/jovyan) and that worked. Example: ``` conda init source /opt/conda/bin/activate mkdir /home/jovyan/envs conda create --prefix /home/jovyan/envs/pynapple python=3.8 conda activate /home/jovyan/envs/pynapple conda install -c conda-forge ipykernel python -m ipykernel install --user --name pynapple python -m pip install pynapple ```