Open asmacdo opened 4 months ago
@ovalerio: I've got a session following your logs. Next time you see this, could you grab a screenshot and a timestamp?
Creating a permanent local environment is good practice generally, I've created an issue to add a splash page or something with some quickstart docs that would include env management. https://github.com/dandi/dandi-hub/issues/168
This could be a handful of things, (it no longer sounds like spot instance interruption or culling, OOM?) so I'll need a bit more info to narrow it down to figure out whats going on.
I have this problem too, or something similar. Dandihub lately has been unresponsive. Either notebooks will seemingly halt with no indication, or some Jupyterhub UI elements will be unresponsive.
From @ovalerio https://github.com/dandi/dandi-hub/issues/170#issuecomment-2231106218
Comments on the issue: "JupyterHub intermittently freezing and asking to be reboot"
Thanks for following up on the JupyterHub issues.
As you might know, last week, during the DANDI ReHack at Janelia we were using both the
dandi-hub
and thehub-staging
for running JupyterLab Notebooks. In my case I was using custom conda environments.The issue could not easily be pinpointed to inactivity, since the disconnection sometimes occurred while the Notebooks Jupyter Kernels were busy doing computation. There was an error message that suggest to restart the notebook. That was the recommendation and if we follow that recommendation then the currently installed packages on the conda environment were lost.
Sometimes if I opt to dismiss the message the session could eventually recover, but with a degraded performance.
To answer your questions:
Did they see any error messages?
Yes, there was a pop-up message in JupyterHub with the option to restart the session and/or dismiss.
Were they actively using the server when kicked off? We have a culler for idle machines, I think its set to 60 minutes.
Creating a new issue from @ovalerio comment: https://github.com/dandi/dandi-hub/issues/170#issuecomment-2231106218
Sometimes it happen when actively in use, sometimes I was also making use of the instance using the JupyterHub Extension in VSCode. I post a separate feature request on that subject last week, but I think there I forgot to mention that I was already using the JupyterHub Extension to connect to my allocated instance, but using notebooks that I was creating locally.
What time were they kicked, I can check to see if theres anything in the logs
After I noticed that every time the session restarted I was losing my python environment I opt to create my environment using a persistent location (/home/jovyan) and that worked.
Example: