Closed fmigneault closed 1 year ago
Build URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2153/
Result : failure
BIRDHOUSE_DEPLOY_BRANCH : jupyterhub-stop-idle
DACCS_CONFIGS_BRANCH : master
PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master
PAVICS_SDI_BRANCH : master
DESTROY_INFRA_ON_EXIT : true
PAVICS_HOST : https://host-140-118.rdext.crim.ca
Tests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1365/
[2023-10-11T23:48:51.688Z] ============================= test session starts ==============================
[2023-10-11T23:48:51.688Z] platform linux -- Python 3.9.16, pytest-7.3.1, pluggy-1.0.0
[2023-10-11T23:48:51.688Z] rootdir: /home/jenkins/agent/workspace/PAVICS-e2e-workflow-tests_master
[2023-10-11T23:48:51.688Z] plugins: anyio-3.6.1, dash-2.10.0, nbval-0.9.6, tornasync-0.6.0.post2, xdist-3.3.1
[2023-10-11T23:48:51.688Z] collected 254 items
[2023-10-11T23:48:51.688Z]
[2023-10-11T23:48:57.001Z] notebooks-auth/geoserver.ipynb ..........F..FFF. [ 6%]
[2023-10-11T23:49:05.077Z] notebooks-auth/test_thredds.ipynb ........... [ 11%]
[2023-10-11T23:49:13.673Z] pavics-sdi-master/docs/source/notebooks/WCS_example.ipynb ....... [ 13%]
[2023-10-11T23:49:22.719Z] pavics-sdi-master/docs/source/notebooks/WFS_example.ipynb ...... [ 16%]
[2023-10-11T23:49:32.382Z] pavics-sdi-master/docs/source/notebooks/WMS_example.ipynb .F...... [ 19%]
[2023-10-11T23:57:20.460Z] pavics-sdi-master/docs/source/notebooks/climex.ipynb ............ [ 24%]
[2023-10-11T23:57:20.460Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-climate-stations.ipynb . [ 24%]
[2023-10-11T23:57:26.374Z] ............... [ 30%]
[2023-10-11T23:57:35.410Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-xclim.ipynb F.F.. [ 32%]
[2023-10-11T23:57:43.001Z] pavics-sdi-master/docs/source/notebooks/esgf-dap.ipynb FFFFFF [ 34%]
[2023-10-11T23:57:59.813Z] pavics-sdi-master/docs/source/notebooks/forecasts.ipynb ...... [ 37%]
[2023-10-11T23:58:01.721Z] pavics-sdi-master/docs/source/notebooks/jupyter_extensions.ipynb . [ 37%]
[2023-10-11T23:58:10.173Z] pavics-sdi-master/docs/source/notebooks/opendap.ipynb ....... [ 40%]
[2023-10-11T23:58:17.042Z] pavics-sdi-master/docs/source/notebooks/pavics_thredds.ipynb ..... [ 42%]
[2023-10-12T00:02:41.777Z] pavics-sdi-master/docs/source/notebooks/regridding.ipynb ............... [ 48%]
[2023-10-12T00:03:52.019Z] ............. [ 53%]
[2023-10-12T00:03:57.276Z] pavics-sdi-master/docs/source/notebooks/rendering.ipynb .... [ 54%]
[2023-10-12T00:04:00.878Z] pavics-sdi-master/docs/source/notebooks/subset-user-input.ipynb ........ [ 57%]
[2023-10-12T00:04:19.351Z] ................. [ 64%]
[2023-10-12T00:04:26.293Z] pavics-sdi-master/docs/source/notebooks/subsetting.ipynb ...... [ 66%]
[2023-10-12T00:04:28.206Z] pavics-sdi-master/docs/source/notebook-components/weaver_example.ipynb . [ 67%]
[2023-10-12T00:04:29.244Z] .FFFFFFFF [ 70%]
[2023-10-12T00:04:39.744Z] finch-master/docs/source/notebooks/dap_subset.ipynb ........... [ 75%]
[2023-10-12T00:04:49.084Z] finch-master/docs/source/notebooks/finch-usage.ipynb ...... [ 77%]
[2023-10-12T00:04:50.467Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-1DataAccess.ipynb . [ 77%]
[2023-10-12T00:04:53.541Z] ...... [ 80%]
[2023-10-12T00:05:01.701Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-2Subsetting.ipynb . [ 80%]
[2023-10-12T00:05:19.387Z] ............. [ 85%]
[2023-10-12T00:05:29.380Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-3Climate-Indicators.ipynb . [ 86%]
[2023-10-12T00:06:08.407Z] ....s. [ 88%]
[2023-10-12T00:06:18.401Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-4Ensembles.ipynb . [ 88%]
[2023-10-12T00:06:32.529Z] ... [ 90%]
[2023-10-12T00:06:44.750Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-5Visualization.ipynb . [ 90%]
[2023-10-12T00:07:07.332Z] ...... [ 92%]
[2023-10-12T00:07:24.773Z] notebooks/hummingbird.ipynb ............ [ 97%]
[2023-10-12T00:10:10.357Z] notebooks/stress-tests.ipynb ...... [100%]
[2023-10-12T00:10:10.357Z]
[2023-10-12T00:10:10.357Z] =================================== FAILURES ===================================
The README of the external jupyterhub-idle-culler also mention about the JupyterHub can already cull idle jupyter itself without explaining why would someone need to use this external version https://github.com/jupyterhub/jupyterhub-idle-culler/blob/737dfa155b809453e6e795dd9de42d6a926fd4a0/README.md?plain=1#L263-L274
This is rather confusing to me.
@mishaschwartz what do you think about keeping 2 different culling options? Would this confuse more the users?
@mishaschwartz what do you think about keeping 2 different culling options? Would this confuse more the users?
Yeah, I'd rather keep one version or the other. It will definitely confuse users.
Can we test out if the old version still works and if not, remove the comment from env.local.
If it does still work then we should use that one instead unless there is some major advantage to the jupyterhub-idle-culler
method. Is there @fmigneault ?
Can we test out if the old version still works and if not, remove the comment from env.local.
Yes. I would like to test existing approach, and if it works, only keep that one.
However, I would like to make it more easily/directly configurable by some optional variable ratger than asking of users to add it to their env.local
via JUPYTERHUB_CONFIG_OVERRIDE
.
I did not even realize this code existed before being pointed out by @tlvu
However, I would like to make it more easily/directly configurable by some optional variable ratger than asking of users to add it to their
env.local
viaJUPYTERHUB_CONFIG_OVERRIDE
.
Yes, true, anything in production will need this to avoid long running idle jupyter containers wasting ram and cpu.
Then I would suggest moving that snippet of code into the default jupyterhub config and just make those timeout limits config variables.
I did not even realize this code existed before being pointed out by @tlvu
Being a production server with many users, we faced this issue https://github.com/bird-house/birdhouse-deploy/issues/67 and although this culling might not fixed the root cause, it probably helped.
@tlvu @mishaschwartz I found this thread that seems to explain properly the differences: https://discourse.jupyter.org/t/which-is-the-correct-way-to-cull-idle-kernels-and-notebook/8123/16
From my reading, it would seem the internal culler is sufficient for our use case. However, I found in the comments that there is also a terminal culler that could keep the session/server active even if the kernel is idle, if only a terminal is keep open. (https://github.com/jupyter-server/jupyter_server/pull/438)
So the config might need some extra parameters, but I would stick with only the internal parameters if those work by themselves.
Overview
Jupyter instances that have been started by users are simply left in a running status indefinitely if not manually logged of. This causes dockers
jupyter-<user>
to remain active, using resources unnecessarily.This feature adds a new component that helps quickly detecting idle jupyter servers, and stopping them after a given inactivity timeout. The logic behind it depends on what JupyterHub reports on its activity API endpoint for the given user.
To have the utility available, the following requirements must be applied: https://github.com/Ouranosinc/jupyterhub/pull/21 To test quickly, simply build the Docker on this branch and override
JUPYTERHUB_DOCKER
andJUPYTER_VERSION
accordingly, and enableoptional-components/jupyterhub-stop-idle
inEXTRA_CONF_DIRS
. SettingJUPYTERHUB_STOP_IDLE_TIMEOUT
can be set to adjust the timeout interval.Changes
Non-breaking changes
optional-components/jupyterhub-stop-idle
allowing culling of idle jupyter servers of users.Breaking changes
Related Issue / Discussion
To Do
Still investigating
The timeout frequency is properly applied and docker logs show that activity status checks are performed, but servers are still considerd in a "running" status for some reason, even if nothing is running on them, nor any tab refreshing them being open. I will look more into that, but I want to open this PR right away to gather early feedback.