bird-house / birdhouse-deploy

Scripts and configurations to deploy the various birds and servers required for a full-fledged production platform
https://birdhouse-deploy.readthedocs.io/en/latest/
Apache License 2.0
4 stars 7 forks source link

add `optional-components/jupyterhub-stop-idle` #389

Closed fmigneault closed 1 year ago

fmigneault commented 1 year ago

Overview

Jupyter instances that have been started by users are simply left in a running status indefinitely if not manually logged of. This causes dockers jupyter-<user> to remain active, using resources unnecessarily.

This feature adds a new component that helps quickly detecting idle jupyter servers, and stopping them after a given inactivity timeout. The logic behind it depends on what JupyterHub reports on its activity API endpoint for the given user.

To have the utility available, the following requirements must be applied: https://github.com/Ouranosinc/jupyterhub/pull/21 To test quickly, simply build the Docker on this branch and override JUPYTERHUB_DOCKER and JUPYTER_VERSION accordingly, and enable optional-components/jupyterhub-stop-idle in EXTRA_CONF_DIRS. Setting JUPYTERHUB_STOP_IDLE_TIMEOUT can be set to adjust the timeout interval.

Changes

Non-breaking changes

Breaking changes

Related Issue / Discussion

To Do

Still investigating

The timeout frequency is properly applied and docker logs show that activity status checks are performed, but servers are still considerd in a "running" status for some reason, even if nothing is running on them, nor any tab refreshing them being open. I will look more into that, but I want to open this PR right away to gather early feedback.

crim-jenkins-bot commented 1 year ago

E2E Test Results

DACCS-iac Pipeline Results

Build URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/2153/
Result : failure

BIRDHOUSE_DEPLOY_BRANCH : jupyterhub-stop-idle
DACCS_CONFIGS_BRANCH : master
PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master
PAVICS_SDI_BRANCH : master

DESTROY_INFRA_ON_EXIT : true
PAVICS_HOST : https://host-140-118.rdext.crim.ca

PAVICS-e2e-workflow-tests Pipeline Results

Tests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/1365/

NOTEBOOK TEST RESULTS
    
[2023-10-11T23:48:51.688Z] ============================= test session starts ==============================
[2023-10-11T23:48:51.688Z] platform linux -- Python 3.9.16, pytest-7.3.1, pluggy-1.0.0
[2023-10-11T23:48:51.688Z] rootdir: /home/jenkins/agent/workspace/PAVICS-e2e-workflow-tests_master
[2023-10-11T23:48:51.688Z] plugins: anyio-3.6.1, dash-2.10.0, nbval-0.9.6, tornasync-0.6.0.post2, xdist-3.3.1
[2023-10-11T23:48:51.688Z] collected 254 items
[2023-10-11T23:48:51.688Z] 
[2023-10-11T23:48:57.001Z] notebooks-auth/geoserver.ipynb ..........F..FFF.                         [  6%]
[2023-10-11T23:49:05.077Z] notebooks-auth/test_thredds.ipynb ...........                            [ 11%]
[2023-10-11T23:49:13.673Z] pavics-sdi-master/docs/source/notebooks/WCS_example.ipynb .......        [ 13%]
[2023-10-11T23:49:22.719Z] pavics-sdi-master/docs/source/notebooks/WFS_example.ipynb ......         [ 16%]
[2023-10-11T23:49:32.382Z] pavics-sdi-master/docs/source/notebooks/WMS_example.ipynb .F......       [ 19%]
[2023-10-11T23:57:20.460Z] pavics-sdi-master/docs/source/notebooks/climex.ipynb ............        [ 24%]
[2023-10-11T23:57:20.460Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-climate-stations.ipynb . [ 24%]
[2023-10-11T23:57:26.374Z] ...............                                                          [ 30%]
[2023-10-11T23:57:35.410Z] pavics-sdi-master/docs/source/notebooks/eccc-geoapi-xclim.ipynb F.F..    [ 32%]
[2023-10-11T23:57:43.001Z] pavics-sdi-master/docs/source/notebooks/esgf-dap.ipynb FFFFFF            [ 34%]
[2023-10-11T23:57:59.813Z] pavics-sdi-master/docs/source/notebooks/forecasts.ipynb ......           [ 37%]
[2023-10-11T23:58:01.721Z] pavics-sdi-master/docs/source/notebooks/jupyter_extensions.ipynb .       [ 37%]
[2023-10-11T23:58:10.173Z] pavics-sdi-master/docs/source/notebooks/opendap.ipynb .......            [ 40%]
[2023-10-11T23:58:17.042Z] pavics-sdi-master/docs/source/notebooks/pavics_thredds.ipynb .....       [ 42%]
[2023-10-12T00:02:41.777Z] pavics-sdi-master/docs/source/notebooks/regridding.ipynb ............... [ 48%]
[2023-10-12T00:03:52.019Z] .............                                                            [ 53%]
[2023-10-12T00:03:57.276Z] pavics-sdi-master/docs/source/notebooks/rendering.ipynb ....             [ 54%]
[2023-10-12T00:04:00.878Z] pavics-sdi-master/docs/source/notebooks/subset-user-input.ipynb ........ [ 57%]
[2023-10-12T00:04:19.351Z] .................                                                        [ 64%]
[2023-10-12T00:04:26.293Z] pavics-sdi-master/docs/source/notebooks/subsetting.ipynb ......          [ 66%]
[2023-10-12T00:04:28.206Z] pavics-sdi-master/docs/source/notebook-components/weaver_example.ipynb . [ 67%]
[2023-10-12T00:04:29.244Z] .FFFFFFFF                                                                [ 70%]
[2023-10-12T00:04:39.744Z] finch-master/docs/source/notebooks/dap_subset.ipynb ...........          [ 75%]
[2023-10-12T00:04:49.084Z] finch-master/docs/source/notebooks/finch-usage.ipynb ......              [ 77%]
[2023-10-12T00:04:50.467Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-1DataAccess.ipynb . [ 77%]
[2023-10-12T00:04:53.541Z] ......                                                                   [ 80%]
[2023-10-12T00:05:01.701Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-2Subsetting.ipynb . [ 80%]
[2023-10-12T00:05:19.387Z] .............                                                            [ 85%]
[2023-10-12T00:05:29.380Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-3Climate-Indicators.ipynb . [ 86%]
[2023-10-12T00:06:08.407Z] ....s.                                                                   [ 88%]
[2023-10-12T00:06:18.401Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-4Ensembles.ipynb . [ 88%]
[2023-10-12T00:06:32.529Z] ...                                                                      [ 90%]
[2023-10-12T00:06:44.750Z] PAVICS-landing-master/content/notebooks/climate_indicators/PAVICStutorial_ClimateDataAnalysis-5Visualization.ipynb . [ 90%]
[2023-10-12T00:07:07.332Z] ......                                                                   [ 92%]
[2023-10-12T00:07:24.773Z] notebooks/hummingbird.ipynb ............                                 [ 97%]
[2023-10-12T00:10:10.357Z] notebooks/stress-tests.ipynb ......                                      [100%]
[2023-10-12T00:10:10.357Z] 
[2023-10-12T00:10:10.357Z] =================================== FAILURES ===================================
    
  
tlvu commented 1 year ago

The README of the external jupyterhub-idle-culler also mention about the JupyterHub can already cull idle jupyter itself without explaining why would someone need to use this external version https://github.com/jupyterhub/jupyterhub-idle-culler/blob/737dfa155b809453e6e795dd9de42d6a926fd4a0/README.md?plain=1#L263-L274

This is rather confusing to me.

@mishaschwartz what do you think about keeping 2 different culling options? Would this confuse more the users?

mishaschwartz commented 1 year ago

@mishaschwartz what do you think about keeping 2 different culling options? Would this confuse more the users?

Yeah, I'd rather keep one version or the other. It will definitely confuse users.

Can we test out if the old version still works and if not, remove the comment from env.local. If it does still work then we should use that one instead unless there is some major advantage to the jupyterhub-idle-culler method. Is there @fmigneault ?

fmigneault commented 1 year ago

Can we test out if the old version still works and if not, remove the comment from env.local.

Yes. I would like to test existing approach, and if it works, only keep that one. However, I would like to make it more easily/directly configurable by some optional variable ratger than asking of users to add it to their env.local via JUPYTERHUB_CONFIG_OVERRIDE. I did not even realize this code existed before being pointed out by @tlvu

tlvu commented 1 year ago

However, I would like to make it more easily/directly configurable by some optional variable ratger than asking of users to add it to their env.local via JUPYTERHUB_CONFIG_OVERRIDE.

Yes, true, anything in production will need this to avoid long running idle jupyter containers wasting ram and cpu.

Then I would suggest moving that snippet of code into the default jupyterhub config and just make those timeout limits config variables.

I did not even realize this code existed before being pointed out by @tlvu

Being a production server with many users, we faced this issue https://github.com/bird-house/birdhouse-deploy/issues/67 and although this culling might not fixed the root cause, it probably helped.

fmigneault commented 1 year ago

@tlvu @mishaschwartz I found this thread that seems to explain properly the differences: https://discourse.jupyter.org/t/which-is-the-correct-way-to-cull-idle-kernels-and-notebook/8123/16

From my reading, it would seem the internal culler is sufficient for our use case. However, I found in the comments that there is also a terminal culler that could keep the session/server active even if the kernel is idle, if only a terminal is keep open. (https://github.com/jupyter-server/jupyter_server/pull/438)

So the config might need some extra parameters, but I would stick with only the internal parameters if those work by themselves.