NSLS-II-CHX / CHX_BugReport

Unified issue-tracker for bugs in the data analysis JupyterHub at CHX
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Kernel stability issue #9

Closed julbrandt closed 7 years ago

julbrandt commented 8 years ago

We are having trouble running through the analysis scripts because the kernel seems to crash frequently. We can reset the kernel, but doing so clears the notebook so that we have to rerun all the code and by the time we get back to where we were prior to resetting, the kernel crashes again.

tacaswell commented 8 years ago

Can you provide any details about the crash?

julbrandt commented 8 years ago

Sorry, I was kind of vague in the original post. I have been trying to run a notebook in my folder called XPCX_GISAXS_June-2016_C60Sam2_93b25d. Recently the notebook has become unresponsive anywhere from 5-30 minutes after starting, which is not long enough to complete. So far, the only way I have found to fix this is to close the notebook and start over again. This is not ideal because then you have to run all the code over again.

I have tried to post a screenshot below of a frozen notebook I just ran. image

This notebook froze after 30 minutes. The code of the cell where it is frozen is just to display the code for a function, so it is not that it is a long calculation. There is no indication in the notebook that anything is wrong, it just looks busy.

Another thing I've noticed however is in the browser tab there is usually a (busy) when a notebook is executing a cell. In this case it said (busy) for a minute or so and then disappeared.

I have tried clicking on Reconnect under the Kernel menu, but this didn't work. I have also tried Restart, but get a Dead kernel error message: chx_error_2

tacaswell commented 8 years ago

I have seen this on the CSX machines as well. My guess is that this is a network thing as I could see my kernel still running on the server (by logging into the server and using top) but could not talk to it via the notebook front end.

danielballan commented 8 years ago

This issue is still a mystery. Mizuki and I will probably need to dedicate a day or more to untangle it. In the meantime, for anyone who hasn't heard, use https://notebook-backup.nsls2.bnl.gov, which points to the old deployment that does not seem to have this problem.