Closed fmigneault closed 1 year ago
@tlvu @mishaschwartz
I've tested it using some low values (eg: JUPYTER_IDLE_KERNEL_CULL_TIMEOUT=10
).
At some point, my jupyter-<user>
image got shut down, though it took longer than anticipated.
Some jupyter-server docs indicate that shutdown could take longer, considering the timeout only triggers the request to shutdown, but does not monitor/consider the shutdown duration itself.
Either way, at least, it shut down at some point, without the need of the other strategy.
I've opted for a default 1day timeout to promote servers using defaults/minimally-overridden configs to have cleanup enabled with some reasonable time, so user can experiment for some time without being impacted by shutdown, while stopping automatiically when they are clearly not using it anymore.
Build URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2156/
Result : failure
BIRDHOUSE_DEPLOY_BRANCH : jupyterhub-idle-timeout-config
DACCS_CONFIGS_BRANCH : master
PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master
PAVICS_SDI_BRANCH : master
DESTROY_INFRA_ON_EXIT : true
PAVICS_HOST : https://host-140-118.rdext.crim.ca
Tests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1368/
[2023-10-12T22:02:27.679Z] ============================= test session starts ==============================
[2023-10-12T22:02:27.679Z] platform linux -- Python 3.9.16, pytest-7.3.1, pluggy-1.0.0
[2023-10-12T22:02:27.679Z] rootdir: /home/jenkins/agent/workspace/PAVICS-e2e-workflow-tests_master
[2023-10-12T22:02:27.679Z] plugins: anyio-3.6.1, dash-2.10.0, nbval-0.9.6, tornasync-0.6.0.post2, xdist-3.3.1
[2023-10-12T22:02:27.679Z] collected 254 items
[2023-10-12T22:02:27.679Z]
[2023-10-12T22:02:34.242Z] notebooks-auth/geoserver.ipynb ..........F..FFF. [ 6%]
[2023-10-12T22:02:42.472Z] notebooks-auth/test_thredds.ipynb ........... [ 11%]
[2023-10-12T22:02:50.700Z] pavics-sdi-master/docs/source/notebooks/WCS_example.ipynb ....... [ 13%]
[2023-10-12T22:03:00.791Z] pavics-sdi-master/docs/source/notebooks/WFS_example.ipynb ...... [ 16%]
[2023-10-12T22:03:10.808Z] pavics-sdi-master/docs/source/notebooks/WMS_example.ipynb .F...... [ 19%]
[2023-10-12T22:10:53.091Z] pavics-sdi-master/docs/source/notebooks/climex.ipynb ............ [ 24%]
[2023-10-12T22:10:53.091Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-climate-stations.ipynb . [ 24%]
[2023-10-12T22:10:55.035Z] ............... [ 30%]
[2023-10-12T22:11:04.439Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-xclim.ipynb F.F.. [ 32%]
[2023-10-12T22:11:11.439Z] pavics-sdi-master/docs/source/notebooks/esgf-dap.ipynb ...... [ 34%]
[2023-10-12T22:11:28.814Z] pavics-sdi-master/docs/source/notebooks/forecasts.ipynb ...... [ 37%]
[2023-10-12T22:11:30.461Z] pavics-sdi-master/docs/source/notebooks/jupyter_extensions.ipynb . [ 37%]
[2023-10-12T22:11:38.586Z] pavics-sdi-master/docs/source/notebooks/opendap.ipynb ....... [ 40%]
[2023-10-12T22:11:43.265Z] pavics-sdi-master/docs/source/notebooks/pavics_thredds.ipynb ..... [ 42%]
[2023-10-12T22:15:10.232Z] pavics-sdi-master/docs/source/notebooks/regridding.ipynb ............... [ 48%]
[2023-10-12T22:16:22.618Z] ............. [ 53%]
[2023-10-12T22:16:26.720Z] pavics-sdi-master/docs/source/notebooks/rendering.ipynb .... [ 54%]
[2023-10-12T22:16:28.934Z] pavics-sdi-master/docs/source/notebooks/subset-user-input.ipynb ........ [ 57%]
[2023-10-12T22:16:45.338Z] ................. [ 64%]
[2023-10-12T22:16:53.697Z] pavics-sdi-master/docs/source/notebooks/subsetting.ipynb ...... [ 66%]
[2023-10-12T22:16:55.084Z] pavics-sdi-master/docs/source/notebook-components/weaver_example.ipynb . [ 67%]
[2023-10-12T22:16:56.449Z] .FFFFFFFF [ 70%]
[2023-10-12T22:17:07.033Z] finch-master/docs/source/notebooks/dap_subset.ipynb ........... [ 75%]
[2023-10-12T22:17:16.387Z] finch-master/docs/source/notebooks/finch-usage.ipynb ...... [ 77%]
[2023-10-12T22:17:17.791Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-1DataAccess.ipynb . [ 77%]
[2023-10-12T22:17:21.105Z] ...... [ 80%]
[2023-10-12T22:17:29.248Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-2Subsetting.ipynb . [ 80%]
[2023-10-12T22:17:47.333Z] ............. [ 85%]
[2023-10-12T22:17:57.327Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-3Climate-Indicators.ipynb . [ 86%]
[2023-10-12T22:18:39.720Z] ....s. [ 88%]
[2023-10-12T22:18:47.874Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-4Ensembles.ipynb . [ 88%]
[2023-10-12T22:19:03.403Z] ... [ 90%]
[2023-10-12T22:19:18.314Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-5Visualization.ipynb . [ 90%]
[2023-10-12T22:19:40.868Z] ...... [ 92%]
[2023-10-12T22:19:43.363Z] notebooks/hummingbird.ipynb ............ [ 97%]
[2023-10-12T22:22:28.955Z] notebooks/stress-tests.ipynb ...... [100%]
[2023-10-12T22:22:28.955Z]
[2023-10-12T22:22:28.955Z] =================================== FAILURES ===================================
I've tested it using some low values (eg:
JUPYTER_IDLE_KERNEL_CULL_TIMEOUT=10
). At some point, myjupyter-<user>
image got shut down, though it took longer than anticipated.
I assume you also set JUPYTER_IDLE_SERVER_CULL_TIMEOUT=10
? So with those 2 configs, after 10 sec, the notebook kernel got killed, then 10 sec later the server is killed.
The server cull timeout starts only when all kernels are down.
For us at Ouranos, we set the server timeout to 4 days and the kernel timeout to 1 day. Together, they account for 5 days of complete inactivity before the server is actually gone, so any 4 days long weekend is taken care of.
I've tested it using some low values (eg:
JUPYTER_IDLE_KERNEL_CULL_TIMEOUT=10
). At some point, myjupyter-<user>
image got shut down, though it took longer than anticipated.I assume you also set
JUPYTER_IDLE_SERVER_CULL_TIMEOUT=10
? So with those 2 configs, after 10 sec, the notebook kernel got killed, then 10 sec later the server is killed.The server cull timeout starts only when all kernels are down.
For us at Ouranos, we set the server timeout to 4 days and the kernel timeout to 1 day. Together, they account for 5 days of complete inactivity before the server is actually gone, so any 4 days long weekend is taken care of.
Given this new config, if JUPYTER_IDLE_SERVER_CULL_TIMEOUT
is not set or is zero (the default), the server would not be killed at all.
I think we should put some value to avoid the server running forever. Maybe a bigger value like a week would be reasonable?
I assume you also set
JUPYTER_IDLE_SERVER_CULL_TIMEOUT=10
? So with those 2 configs, after 10 sec, the notebook kernel got killed, then 10 sec later the server is killed.
Indeed. Sorry for the confusion. Both variables were set at 10s.
Build URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2159/
Result : failure
BIRDHOUSE_DEPLOY_BRANCH : jupyterhub-idle-timeout-config
DACCS_CONFIGS_BRANCH : master
PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master
PAVICS_SDI_BRANCH : master
DESTROY_INFRA_ON_EXIT : true
PAVICS_HOST : https://host-140-216.rdext.crim.ca
Tests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1370/
[2023-10-13T16:46:42.406Z] ============================= test session starts ==============================
[2023-10-13T16:46:42.406Z] platform linux -- Python 3.9.16, pytest-7.3.1, pluggy-1.0.0
[2023-10-13T16:46:42.406Z] rootdir: /home/jenkins/agent/workspace/PAVICS-e2e-workflow-tests_master@2
[2023-10-13T16:46:42.407Z] plugins: anyio-3.6.1, dash-2.10.0, nbval-0.9.6, tornasync-0.6.0.post2, xdist-3.3.1
[2023-10-13T16:46:42.407Z] collected 254 items
[2023-10-13T16:46:42.407Z]
[2023-10-13T16:46:48.677Z] notebooks-auth/geoserver.ipynb ..........F..FFF. [ 6%]
[2023-10-13T16:46:56.342Z] notebooks-auth/test_thredds.ipynb ........... [ 11%]
[2023-10-13T16:47:04.558Z] pavics-sdi-master/docs/source/notebooks/WCS_example.ipynb ....... [ 13%]
[2023-10-13T16:47:14.259Z] pavics-sdi-master/docs/source/notebooks/WFS_example.ipynb ...... [ 16%]
[2023-10-13T16:47:20.610Z] pavics-sdi-master/docs/source/notebooks/WMS_example.ipynb .F...... [ 19%]
[2023-10-13T17:01:06.959Z] pavics-sdi-master/docs/source/notebooks/climex.ipynb ............ [ 24%]
[2023-10-13T17:02:28.454Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-climate-stations.ipynb . [ 24%]
[2023-10-13T17:02:30.651Z] ............... [ 30%]
[2023-10-13T17:02:41.066Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-xclim.ipynb F.F.. [ 32%]
[2023-10-13T17:02:47.981Z] pavics-sdi-master/docs/source/notebooks/esgf-dap.ipynb ...... [ 34%]
[2023-10-13T17:03:05.329Z] pavics-sdi-master/docs/source/notebooks/forecasts.ipynb ...... [ 37%]
[2023-10-13T17:03:06.715Z] pavics-sdi-master/docs/source/notebooks/jupyter_extensions.ipynb . [ 37%]
[2023-10-13T17:03:15.309Z] pavics-sdi-master/docs/source/notebooks/opendap.ipynb ....... [ 40%]
[2023-10-13T17:03:20.095Z] pavics-sdi-master/docs/source/notebooks/pavics_thredds.ipynb ..... [ 42%]
[2023-10-13T17:07:36.133Z] pavics-sdi-master/docs/source/notebooks/regridding.ipynb ............... [ 48%]
[2023-10-13T17:08:58.644Z] ............. [ 53%]
[2023-10-13T17:09:01.384Z] pavics-sdi-master/docs/source/notebooks/rendering.ipynb .... [ 54%]
[2023-10-13T17:09:04.250Z] pavics-sdi-master/docs/source/notebooks/subset-user-input.ipynb ........ [ 57%]
[2023-10-13T17:09:21.112Z] ................. [ 64%]
[2023-10-13T17:09:28.688Z] pavics-sdi-master/docs/source/notebooks/subsetting.ipynb ...... [ 66%]
[2023-10-13T17:09:30.079Z] pavics-sdi-master/docs/source/notebook-components/weaver_example.ipynb . [ 67%]
[2023-10-13T17:09:47.846Z] ........F [ 70%]
[2023-10-13T17:09:57.406Z] finch-master/docs/source/notebooks/dap_subset.ipynb ........... [ 75%]
[2023-10-13T17:10:06.693Z] finch-master/docs/source/notebooks/finch-usage.ipynb ...... [ 77%]
[2023-10-13T17:10:08.086Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-1DataAccess.ipynb . [ 77%]
[2023-10-13T17:10:11.417Z] ...... [ 80%]
[2023-10-13T17:10:21.401Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-2Subsetting.ipynb . [ 80%]
[2023-10-13T17:10:39.600Z] ............. [ 85%]
[2023-10-13T17:10:51.823Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-3Climate-Indicators.ipynb . [ 86%]
[2023-10-13T17:11:38.762Z] ....s. [ 88%]
[2023-10-13T17:11:46.927Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-4Ensembles.ipynb . [ 88%]
[2023-10-13T17:12:03.393Z] ... [ 90%]
[2023-10-13T17:12:18.351Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-5Visualization.ipynb . [ 90%]
[2023-10-13T17:12:41.992Z] ...... [ 92%]
[2023-10-13T17:12:44.732Z] notebooks/hummingbird.ipynb ............ [ 97%]
[2023-10-13T17:15:18.903Z] notebooks/stress-tests.ipynb ...... [100%]
[2023-10-13T17:15:18.903Z]
[2023-10-13T17:15:18.903Z] =================================== FAILURES ===================================
Build URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2158/
Result : failure
BIRDHOUSE_DEPLOY_BRANCH : jupyterhub-idle-timeout-config
DACCS_CONFIGS_BRANCH : master
PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master
PAVICS_SDI_BRANCH : master
DESTROY_INFRA_ON_EXIT : true
PAVICS_HOST : https://host-140-118.rdext.crim.ca
Tests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1369/
[2023-10-13T16:46:24.999Z] ============================= test session starts ==============================
[2023-10-13T16:46:24.999Z] platform linux -- Python 3.9.16, pytest-7.3.1, pluggy-1.0.0
[2023-10-13T16:46:24.999Z] rootdir: /home/jenkins/agent/workspace/PAVICS-e2e-workflow-tests_master
[2023-10-13T16:46:24.999Z] plugins: anyio-3.6.1, dash-2.10.0, nbval-0.9.6, tornasync-0.6.0.post2, xdist-3.3.1
[2023-10-13T16:46:24.999Z] collected 254 items
[2023-10-13T16:46:24.999Z]
[2023-10-13T16:46:30.597Z] notebooks-auth/geoserver.ipynb ..........F..FFF. [ 6%]
[2023-10-13T16:46:38.307Z] notebooks-auth/test_thredds.ipynb ........... [ 11%]
[2023-10-13T16:46:47.009Z] pavics-sdi-master/docs/source/notebooks/WCS_example.ipynb ....... [ 13%]
[2023-10-13T16:46:56.970Z] pavics-sdi-master/docs/source/notebooks/WFS_example.ipynb ...... [ 16%]
[2023-10-13T16:47:09.192Z] pavics-sdi-master/docs/source/notebooks/WMS_example.ipynb .F...... [ 19%]
[2023-10-13T17:01:13.215Z] pavics-sdi-master/docs/source/notebooks/climex.ipynb ............ [ 24%]
[2023-10-13T17:02:20.985Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-climate-stations.ipynb . [ 24%]
[2023-10-13T17:02:30.797Z] ............... [ 30%]
[2023-10-13T17:02:40.720Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-xclim.ipynb F.F.. [ 32%]
[2023-10-13T17:02:48.010Z] pavics-sdi-master/docs/source/notebooks/esgf-dap.ipynb ...... [ 34%]
[2023-10-13T17:03:05.782Z] pavics-sdi-master/docs/source/notebooks/forecasts.ipynb ...... [ 37%]
[2023-10-13T17:03:07.168Z] pavics-sdi-master/docs/source/notebooks/jupyter_extensions.ipynb . [ 37%]
[2023-10-13T17:03:15.197Z] pavics-sdi-master/docs/source/notebooks/opendap.ipynb ....... [ 40%]
[2023-10-13T17:03:19.719Z] pavics-sdi-master/docs/source/notebooks/pavics_thredds.ipynb ..... [ 42%]
[2023-10-13T17:07:36.134Z] pavics-sdi-master/docs/source/notebooks/regridding.ipynb ............... [ 48%]
[2023-10-13T17:08:55.765Z] ............. [ 53%]
[2023-10-13T17:08:59.714Z] pavics-sdi-master/docs/source/notebooks/rendering.ipynb .... [ 54%]
[2023-10-13T17:09:02.042Z] pavics-sdi-master/docs/source/notebooks/subset-user-input.ipynb ........ [ 57%]
[2023-10-13T17:09:19.985Z] ................. [ 64%]
[2023-10-13T17:09:28.187Z] pavics-sdi-master/docs/source/notebooks/subsetting.ipynb ...... [ 66%]
[2023-10-13T17:09:29.578Z] pavics-sdi-master/docs/source/notebook-components/weaver_example.ipynb . [ 67%]
[2023-10-13T17:09:40.435Z] ........F [ 70%]
[2023-10-13T17:09:50.720Z] finch-master/docs/source/notebooks/dap_subset.ipynb ........... [ 75%]
[2023-10-13T17:09:59.680Z] finch-master/docs/source/notebooks/finch-usage.ipynb ...... [ 77%]
[2023-10-13T17:10:01.064Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-1DataAccess.ipynb . [ 77%]
[2023-10-13T17:10:04.449Z] ...... [ 80%]
[2023-10-13T17:10:14.451Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-2Subsetting.ipynb . [ 80%]
[2023-10-13T17:10:34.234Z] ............. [ 85%]
[2023-10-13T17:10:44.270Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-3Climate-Indicators.ipynb . [ 86%]
[2023-10-13T17:11:35.268Z] ....s. [ 88%]
[2023-10-13T17:11:43.434Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-4Ensembles.ipynb . [ 88%]
[2023-10-13T17:11:59.896Z] ... [ 90%]
[2023-10-13T17:12:18.014Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-5Visualization.ipynb . [ 90%]
[2023-10-13T17:12:40.805Z] ...... [ 92%]
[2023-10-13T17:12:43.774Z] notebooks/hummingbird.ipynb ............ [ 97%]
[2023-10-13T17:15:29.400Z] notebooks/stress-tests.ipynb ...... [100%]
[2023-10-13T17:15:29.400Z]
[2023-10-13T17:15:29.400Z] =================================== FAILURES ===================================
LGTM. Maybe set the server timeout to 4 days instead of 3 days so the guy coming back from a 4 days long weekend have a chance resume his work?
I'd consider that an edge case. Also, they should be able to resume work from a notebook saved in the user-workspace that would be re-mounted on server restart.
I'd consider that an edge case. Also, they should be able to resume work from a notebook saved in the user-workspace that would be re-mounted on server restart.
Yes it's an edge case.
It's not about preserving the notebooks, it's about preserving any custom installs the user had made without properly recording them in a requirements.txt
or environment.yml
.
It's fine, 3 or 4, the node admin with adjust if they receive complains over long weekend.
Overview
Add new variables to easily configure idle jupyter user instances.
Changes
Non-breaking changes
JUPYTER_IDLE_SERVER_CULL_TIMEOUT
,JUPYTER_IDLE_KERNEL_CULL_TIMEOUT
andJUPYTER_IDLE_KERNEL_CULL_INTERVAL
that allows fined-grained configuration of user-kernel and server-wide docker image culling when their activity status reached a certain idle timeout threshold.JUPYTERHUB_CONFIG_OVERRIDE
specifically for idle server culling. If similar argument parameters should be defined using an olderJUPYTERHUB_CONFIG_OVERRIDE
definition, the new configuration strategy can be skipped by settingJUPYTER_IDLE_KERNEL_CULL_TIMEOUT=0
.Breaking changes
Related Issue / Discussion