dandi / dandi-hub

Infrastructure and code for the dandihub
https://hub.dandiarchive.org
Other
11 stars 23 forks source link

JupyterHub intermittently *freezing* and asking to be *restart notebook* #171

Open asmacdo opened 4 months ago

asmacdo commented 4 months ago

From @ovalerio https://github.com/dandi/dandi-hub/issues/170#issuecomment-2231106218

Comments on the issue: "JupyterHub intermittently freezing and asking to be reboot"

Thanks for following up on the JupyterHub issues.

As you might know, last week, during the DANDI ReHack at Janelia we were using both the dandi-hub and the hub-staging for running JupyterLab Notebooks. In my case I was using custom conda environments.

The issue could not easily be pinpointed to inactivity, since the disconnection sometimes occurred while the Notebooks Jupyter Kernels were busy doing computation. There was an error message that suggest to restart the notebook. That was the recommendation and if we follow that recommendation then the currently installed packages on the conda environment were lost.

Sometimes if I opt to dismiss the message the session could eventually recover, but with a degraded performance.

To answer your questions:

Did they see any error messages?

Yes, there was a pop-up message in JupyterHub with the option to restart the session and/or dismiss.

Were they actively using the server when kicked off? We have a culler for idle machines, I think its set to 60 minutes.

Creating a new issue from @ovalerio comment: https://github.com/dandi/dandi-hub/issues/170#issuecomment-2231106218

Sometimes it happen when actively in use, sometimes I was also making use of the instance using the JupyterHub Extension in VSCode. I post a separate feature request on that subject last week, but I think there I forgot to mention that I was already using the JupyterHub Extension to connect to my allocated instance, but using notebooks that I was creating locally.

What time were they kicked, I can check to see if theres anything in the logs

After I noticed that every time the session restarted I was losing my python environment I opt to create my environment using a persistent location (/home/jovyan) and that worked.

Example:

  conda init
  source /opt/conda/bin/activate
  mkdir /home/jovyan/envs
  conda create --prefix /home/jovyan/envs/pynapple python=3.8
  conda activate /home/jovyan/envs/pynapple
  conda install -c conda-forge ipykernel
  python -m ipykernel install --user --name pynapple
  python -m pip install pynapple
asmacdo commented 4 months ago

@ovalerio: I've got a session following your logs. Next time you see this, could you grab a screenshot and a timestamp?

Creating a permanent local environment is good practice generally, I've created an issue to add a splash page or something with some quickstart docs that would include env management. https://github.com/dandi/dandi-hub/issues/168

This could be a handful of things, (it no longer sounds like spot instance interruption or culling, OOM?) so I'll need a bit more info to narrow it down to figure out whats going on.

rcpeene commented 4 months ago

I have this problem too, or something similar. Dandihub lately has been unresponsive. Either notebooks will seemingly halt with no indication, or some Jupyterhub UI elements will be unresponsive.