3forges / packman

A Recipe to Terraform a Virtualbox VM, with Docker, Docker Compose on Debian
2 stars 2 forks source link

switch to jupyterhub #12

Open Jean-Baptiste-Lasselle opened 6 months ago

Jean-Baptiste-Lasselle commented 6 months ago
Jean-Baptiste-Lasselle commented 6 months ago

before the production part, i'll need to finish the data map https://poc-eurostat-data-transformers.pages.dev/data-pipeline/

Jean-Baptiste-Lasselle commented 6 months ago

https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner

Jean-Baptiste-Lasselle commented 6 months ago

grosse piste OPEN EBS:

déjà la vraie liste de tout les CSI drivers : https://kubernetes-csi.github.io/docs/drivers.html

ensuite :

Jean-Baptiste-Lasselle commented 6 months ago

à essayer aussi : juicefs

https://juicefs.com/docs/community/getting-started/installation

https://juicefs.com/docs/csi/introduction https://juicefs.com/docs/csi/getting_started

à tester si ej créée un filesystm juicefs sur un S3 minio est-ce que derrière mon CSI driver va marcher ds mon kubernetes ? https://juicefs.com/docs/community/getting-started/standalone#hands-on-practice-2

Jean-Baptiste-Lasselle commented 6 months ago

ouh j'ai cubeFS qui me semble pas mal aussi :

Domage, sanas faire de build spécifique arm64, v8, c'est mort cubefs sur arm64 :

image

Jean-Baptiste-Lasselle commented 6 months ago

à essayer aussi : juicefs

https://juicefs.com/docs/community/getting-started/installation

https://juicefs.com/docs/csi/introduction https://juicefs.com/docs/csi/getting_started

à tester si ej créée un filesystm juicefs sur un S3 minio est-ce que derrière mon CSI driver va marcher ds mon kubernetes ? https://juicefs.com/docs/community/getting-started/standalone#hands-on-practice-2

AWESOME IT WORKED WITH JUICEFS AND A S3 MINIO BUCKET!!!

Jean-Baptiste-Lasselle commented 6 months ago

à essayer aussi : juicefs https://juicefs.com/docs/community/getting-started/installation https://juicefs.com/docs/csi/introduction https://juicefs.com/docs/csi/getting_started à tester si ej créée un filesystm juicefs sur un S3 minio est-ce que derrière mon CSI driver va marcher ds mon kubernetes ? https://juicefs.com/docs/community/getting-started/standalone#hands-on-practice-2

AWESOME IT WORKED WITH JUICEFS AND A S3 MINIO BUCKET!!!

ok i tried juicefs, sucessfully configure jupyterhub to use juicefs storage class, but there is a complex problem to solve : its not easy to deploy different apps using one and only juicefs storage class, and its not easy to configure jupyterhub to use a different storage class for every new spinned up user. (and if i login with 2 different users in jupyterhub it definitel crashes because the juicefs driver fails volume provisioning to the new spinned up jupyterlab)

So next step: I wil definietly try nfs csi driver, we definetly need to master our kubernetes csi stack

Jean-Baptiste-Lasselle commented 6 months ago

it workssss thanks to yandex's csi driver , just awesome, now one last easy thing:

I just need custom image to be used as notebook

https://z2jh.jupyter.org/en/stable/jupyterhub/customizing/user-environment.html#choose-and-use-an-existing-docker-image

Jean-Baptiste-Lasselle commented 6 months ago

it workssss thanks to yandex's csi driver , just awesome, now one last easy thing:

I just need custom image to be used as notebook

https://z2jh.jupyter.org/en/stable/jupyterhub/customizing/user-environment.html#choose-and-use-an-existing-docker-image

Welll I just quickly tried now to use the image i built for jupyterlab, and well it does not work, it definitely needs to be re-designed for jupyterhub, look at env vars for the default image quay.io/jupyterhub/k8s-singleuser-sample:3.3.7 :

    Environment:
      JPY_API_TOKEN:                           980261cf6a7a42bc9c9df3e8c424ad6f
      JUPYTERHUB_ACTIVITY_URL:                 http://hub:8081/hub/api/users/laurent/activity
      JUPYTERHUB_ADMIN_ACCESS:                 1
      JUPYTERHUB_API_TOKEN:                    980261cf6a7a42bc9c9df3e8c424ad6f
      JUPYTERHUB_API_URL:                      http://hub:8081/hub/api
      JUPYTERHUB_BASE_URL:                     /
      JUPYTERHUB_CLIENT_ID:                    jupyterhub-user-laurent
      JUPYTERHUB_COOKIE_HOST_PREFIX_ENABLED:   0
      JUPYTERHUB_HOST:
      JUPYTERHUB_OAUTH_ACCESS_SCOPES:          ["access:servers!server=laurent/", "access:servers!user=laurent"]
      JUPYTERHUB_OAUTH_CALLBACK_URL:           /user/laurent/oauth_callback
      JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES:  []
      JUPYTERHUB_OAUTH_SCOPES:                 ["access:servers!server=laurent/", "access:servers!user=laurent"]
      JUPYTERHUB_SERVER_NAME:
      JUPYTERHUB_SERVICE_PREFIX:               /user/laurent/
      JUPYTERHUB_SERVICE_URL:                  http://0.0.0.0:8888/user/laurent/
      JUPYTERHUB_USER:                         laurent
      JUPYTER_ALLOW_INSECURE_WRITES:           true
      JUPYTER_IMAGE:                           quay.io/jupyterhub/k8s-singleuser-sample:3.3.7
      JUPYTER_IMAGE_SPEC:                      quay.io/jupyterhub/k8s-singleuser-sample:3.3.7
      MEM_GUARANTEE:                           1073741824
Jean-Baptiste-Lasselle commented 6 months ago

here is the image definition, not so complex: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/main/images/singleuser-sample/Dockerfile

Jean-Baptiste-Lasselle commented 6 months ago

ouais ok, c'est sur qu'il aura du al à monter le volume sur un répertoire qui n'existe pas ( /home/jovyan ) :

# executing 'kubectl -n decoderleco describe pod/jupyter-laurent' gives:

    Mounts:
      /home/jovyan from volume-laurent (rw)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  volume-laurent:
    Type:        PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:   claim-laurent
    ReadOnly:    false    Mounts:
      /home/jovyan from volume-laurent (rw)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  volume-laurent:
    Type:        PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:   claim-laurent
    ReadOnly:    false

image

Jean-Baptiste-Lasselle commented 6 months ago

Oh, there's another feature i would certainly want for my kubernetes cluster deployed app, I want date time synchronization:

Note the k8tz does have ARM64 support:

image

Jean-Baptiste-Lasselle commented 6 months ago

ah oui purée, il monte le volume sur /home/jovyan, et du coup ça vire tout mes executables dis-donc...:

ubuntu@DecoderLecoCadeauBOB:~$ kubectl logs pod/jupyter-pierre -n decoderleco
Defaulted container "notebook" out of: notebook, block-cloud-metadata (init)
# --- # --- # --- # --- # --- # ---
# --- # --- # --- # --- # --- # ---
# --- Cheking executables
ls: cannot access '/home/jovyan/.deno/bin/deno': No such file or directory
/run/start.debug.sh: line 15: /home/jovyan/.deno/bin/deno: No such file or directory
ls: cannot access '/home/jovyan/anaconda3/bin/conda': No such file or directory
/run/start.debug.sh: line 17: /home/jovyan/anaconda3/bin/conda: No such file or directory
ls: cannot access '/home/jovyan/.cargo/bin/cargo': No such file or directory
/run/start.debug.sh: line 19: /home/jovyan/.cargo/bin/cargo: No such file or directory
# --- # --- # --- # --- # --- # ---
# --- # --- # --- # --- # --- # ---
/home/jovyan
/run/start.debug.sh: line 29: cargo: command not found
/run/start.debug.sh: line 32: deno: command not found
/run/start.debug.sh: line 36: deno: command not found
/run/start.debug.sh: line 45: jupyter: command not found

https://z2jh.jupyter.org/en/stable/resources/reference.html#singleuser-storage-homemountpath

Jean-Baptiste-Lasselle commented 6 months ago

ah oui purée, il monte le volume sur /home/jovyan, et du coup ça vire tout mes executables dis-donc...:

ubuntu@DecoderLecoCadeauBOB:~$ kubectl logs pod/jupyter-pierre -n decoderleco
Defaulted container "notebook" out of: notebook, block-cloud-metadata (init)
# --- # --- # --- # --- # --- # ---
# --- # --- # --- # --- # --- # ---
# --- Cheking executables
ls: cannot access '/home/jovyan/.deno/bin/deno': No such file or directory
/run/start.debug.sh: line 15: /home/jovyan/.deno/bin/deno: No such file or directory
ls: cannot access '/home/jovyan/anaconda3/bin/conda': No such file or directory
/run/start.debug.sh: line 17: /home/jovyan/anaconda3/bin/conda: No such file or directory
ls: cannot access '/home/jovyan/.cargo/bin/cargo': No such file or directory
/run/start.debug.sh: line 19: /home/jovyan/.cargo/bin/cargo: No such file or directory
# --- # --- # --- # --- # --- # ---
# --- # --- # --- # --- # --- # ---
/home/jovyan
/run/start.debug.sh: line 29: cargo: command not found
/run/start.debug.sh: line 32: deno: command not found
/run/start.debug.sh: line 36: deno: command not found
/run/start.debug.sh: line 45: jupyter: command not found

https://z2jh.jupyter.org/en/stable/resources/reference.html#singleuser-storage-homemountpath

ok en changeant le mount path tout est ok, mais j'ai un nouveau problème:

[I 2024-05-18 23:41:57.504 ServerApp] Writing Jupyter server cookie secret to /home/jovyan/.local/share/jupyter/runtime/jupyter_cookie_secret
[I 2024-05-18 23:41:57.723 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-05-18 23:41:57.757 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-05-18 23:41:57.759 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-05-18 23:41:57.759 ServerApp] jupyter_server_mathjax | extension was successfully loaded.
[I 2024-05-18 23:41:57.760 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-05-18 23:41:57.768 LabApp] JupyterLab extension loaded from /home/jovyan/.local/lib/python3.11/site-packages/jupyterlab
[I 2024-05-18 23:41:57.768 LabApp] JupyterLab application directory is /home/jovyan/.local/share/jupyter/lab
[I 2024-05-18 23:41:57.768 LabApp] Extension Manager is 'pypi'.
[I 2024-05-18 23:41:57.804 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-05-18 23:41:57.807 ServerApp] jupyterlab_git | extension was successfully loaded.
[I 2024-05-18 23:41:57.860 ServerApp] nbdime | extension was successfully loaded.
[I 2024-05-18 23:41:57.863 ServerApp] notebook | extension was successfully loaded.
[I 2024-05-18 23:41:57.864 ServerApp] Serving notebooks from local directory: /home/jovyan
[I 2024-05-18 23:41:57.864 ServerApp] Jupyter Server 2.14.0 is running at:
[I 2024-05-18 23:41:57.864 ServerApp] http://jupyter-pierre:8888/lab?token=...
[I 2024-05-18 23:41:57.864 ServerApp]     http://127.0.0.1:8888/lab?token=...
[I 2024-05-18 23:41:57.864 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 2024-05-18 23:41:57.873 ServerApp] No web browser found: Error('could not locate runnable browser').
[I 2024-05-18 23:41:58.165 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
[I 2024-05-18 23:41:58.915 ServerApp] 302 GET /user/pierre/ (@10.244.2.36) 0.62ms
[I 2024-05-18 23:41:59.211 ServerApp] 302 GET /user/pierre/ (@10.244.3.25) 0.54ms
[W 2024-05-18 23:41:59.280 ServerApp] Clearing invalid/expired login cookie username-localhost-8888
[W 2024-05-18 23:41:59.301 ServerApp] 404 GET /user/pierre (@10.244.3.25) 21.51ms referer=http://localhost:8888/hub/spawn-pending/pierre
[W 2024-05-18 23:42:39.571 ServerApp] 404 GET /user/pierre/lab (@10.244.3.25) 2.43ms referer=None
[I 2024-05-18 23:42:50.062 ServerApp] 302 GET /user/pierre/?lab (@10.244.3.25) 0.59ms
[W 2024-05-18 23:42:50.139 ServerApp] 404 GET /user/pierre?lab (@10.244.3.25) 1.52ms referer=None
[I 2024-05-18 23:43:02.997 ServerApp] 302 GET /user/pierre/ (@10.244.3.25) 0.54ms
[W 2024-05-18 23:43:03.279 ServerApp] 404 GET /user/pierre (@10.244.3.25) 1.28ms referer=None
[I 2024-05-18 23:43:09.416 ServerApp] 302 GET /user/pierre/ (@10.244.3.25) 0.47ms
[W 2024-05-18 23:43:09.568 ServerApp] 404 GET /user/pierre (@10.244.3.25) 1.25ms referer=None
[I 2024-05-18 23:43:09.798 ServerApp] 302 GET /user/pierre/ (@10.244.3.25) 0.47ms
[W 2024-05-18 23:43:09.955 ServerApp] 404 GET /user/pierre (@10.244.3.25) 1.18ms referer=None
[I 2024-05-18 23:43:52.963 ServerApp] 302 GET /user/pierre/lab/ (@10.244.3.25) 0.53ms
[W 2024-05-18 23:43:53.099 ServerApp] 404 GET /user/pierre/lab (@10.244.3.25) 1.23ms referer=None
[W 2024-05-18 23:44:13.598 ServerApp] 404 GET /user/pierre?redirects=1 (@10.244.3.25) 1.24ms referer=None
[W 2024-05-18 23:44:27.197 ServerApp] 404 GET /user/pierre?lab=1 (@10.244.3.25) 1.20ms referer=None
Jean-Baptiste-Lasselle commented 6 months ago

ok je crois que j'ai trouvé :) :

j'ai fait de toutes petites modif pour supprimer l'authentification interne au jupyterlab, pour la laisser déléguée au jupyterhub, et c'est bon ça marche:

image

Jean-Baptiste-Lasselle commented 6 months ago

ah ce n'était pas une bonne idée de désactiver l'authentification image

Jean-Baptiste-Lasselle commented 6 months ago

ohh j'ai compris!!! :

${LA CMD DEFINIE PAR JUPYTERHUB PAR DEFAUT}  --unstable=true --notebook-dir="$NOTEBOOKS_DIR" 

voilàààà

image

Jean-Baptiste-Lasselle commented 6 months ago

ok found where to add this c.Spawner.args :

image

et je dois donc faire un generate config avec la commande jupyter lab ça va me le générer dans le répertoire ~/.jupyter/jupyter_config.py

un truc du genre à vérif mais c'est ça c'est sûr

ah oui mais non en fait c'est de la config du jupyterhub lui pas dans mon image docker, c'est la commande jupyterhub https://jupyterhub.readthedocs.io/en/0.7.2/getting-started.html#technical-overview

Jean-Baptiste-Lasselle commented 6 months ago

I am searching how to integrate :

There I want to note :

~$ docker exec -it d5272d1e503a bash -c 'jupyter lab --help-all | grep auth'
    The full path to a certificate authority certificate for SSL/TLS client
    authentication.
--ServerApp.allow_unauthenticated_access=<Bool>
    Allow unauthenticated access to endpoints without authentication rule.
            in the future), any request to an endpoint without an authentication rule
            (either `@tornado.web.authenticated`, or `@allow_unauthenticated`)
            excluding the endpoints marked with `@allow_unauthenticated` decorator.
            prevent unauthenticated access to endpoints without `@allow_unauthenticated`.
--ServerApp.authenticate_prometheus=<Bool>
            Require authentication to access prometheus metrics.
--ServerApp.authorizer_class=<Type>
    The authorizer class to use.
    Default: 'jupyter_server.auth.authorizer.AllowAllAuthorizer'
    The full path to a certificate authority certificate for SSL/TLS client
    authentication.
            the actual connection URL. If authentication token is enabled, the
            - authenticate with a token
            completely without authentication.
            These services can disable all authentication and security checks,
    Default: 'jupyter_server.auth.identity.PasswordIdentityProvider'
    Default: 'jupyter_server.auth.login.LegacyLoginHandler'
    Default: 'jupyter_server.auth.logout.LogoutHandler'
         prevented the authentication token used to launch the browser from being visible.

So there one idea might be :

And with those re-definitions, I do so that the Authhentication flow handled by JupyterHub is compatible with the authentication endpoint of my server in my docker image.

I still have the feeling that there is osmething wrong about all this:

Jean-Baptiste-Lasselle commented 6 months ago

Ok one useful thing I could do here, is to use multiple profiles for multiple images :

image

Jean-Baptiste-Lasselle commented 6 months ago

the last issue i have is to be able to set ownership and permissions of the mounted volumes:

the volumes are mounted with root ownship, which is bothering, because for example, some jupyterlab extension can't work, like the jupyter-git extension.

Jean-Baptiste-Lasselle commented 6 months ago

the last issue i have is to be able to set ownership and permissions of the mounted volumes:

the volumes are mounted with root ownship, which is bothering, because for example, some jupyterlab extension can't work, like the jupyter-git extension.

ok i found more about this issue :

other interesting things:

Jean-Baptiste-Lasselle commented 6 months ago

pour passer sur gcp ça peut se faire pour 20 euros par mois et la machine n'est pas mal ( 4 vCPU 16 GB de RAM et 60 GB de disque ) :

image

image

image

https://cloud.google.com/products/calculator?dl=CiQ2ZTllODNjMC04YjFkLTQ1ZjQtOTBjZC1mMjYwOTdiMzUxMzcQDhokMzhFN0M3MjctODAwNy00MzMzLUI5NDYtNThDMkQxMEIwQTQy

apparrement il faudrait faire cela avec une adresse IP ephemere, ce qui aurait un coût nul,

exemples rapides :

et pour en gros 25 euros par mois, avec 20 gb de disque boot et un autre disque de 80 gb , vm 8 vCPUs / 32 gb ram:

https://cloud.google.com/products/calculator?dl=CiQ4NzZkM2IwYi0xODFhLTQxZGItODg5My1kZmU1OGFiNjE3ODUQCBokOTRCN0JFN0YtRDBBNi00MjI5LTgzQkEtODlCOUQ3QjdFM0E2

Jean-Baptiste-Lasselle commented 6 months ago

pour info pour une éventuelle prod, prix du stockage :

image

image

Jean-Baptiste-Lasselle commented 6 months ago

the last issue i have is to be able to set ownership and permissions of the mounted volumes:

the volumes are mounted with root ownship, which is bothering, because for example, some jupyterlab extension can't work, like the jupyter-git extension.

ok i found more about this issue :

other interesting things:

  • Can't change directory permissions or ownership via s3fs minio/minio#6496
  • Oh Ok, i have real answers about my issue with the volume mounts there :

    • s3fs and rclone support yandex-cloud/k8s-csi-s3#16
    • others than GeeseFS have major issues like being slow, being hard to setup, etc
    • now i remember it worked like a charm real well in one case for me: when i used root user to run my jupyter. So can i modify my docker image to run with root user (in which case i'll have to add the --allow-root arg to the notebook) ? Will it be a problem for the jupyterhub ? I'll have then to mount my volume anywhere i want actually like on /opt/jupyterlab/workspace , for example, it's only then that the jupyter extensions will work well, yet,it would be a security issue, wouldn't it...?
    • what can i do to solve the issue well? GeeseFS only works well with Yandex s3, so does that mean that if i cahnge the s3 backend (minio), for anothe, i might solve my issue..? not sure at all...
    • Oh, actually it is Yandex themselves who implemented GeeseFS and the subtitle of the repo makes it clear they had a hard time with other s3 filesystem to have good POSIX compatibility... : Finally a good POSIX-ish FUSE s3 filesystem written in go : This is why here I don't think i will go much further than where i got, with this issue except by using either a root user inside my containers (but then i'll have to harden security a lot for production), or I haev to switch to a non s3 CSI driver . In case i run with root user inside contianer, i would at least try and provision the whole stack by running containers with podman and without running the containers runtime daemon as root (is it possible to run as rot inside container if container runtime is not executed as root ? Im not sure there)
  • my conclusion is i definitely will have to find full posix support csi driver not based on s3: would NFS csi drivers do ? i don't like the idea of using NFS, so perhaps ceph with block storage and not s3. Yet it still is a problem this issue about having s3 buckets monted necessarily as root in pod containers... )

THE NEXT TEST: I'll try OpenEBS (not iSCSI csi driver)

https://openebs.io/docs/quickstart-guide/installation

yes definitely OpenEBS it's the best i found there for now, i give a quick try and:

this was the first thing i thought of about block storage csi driver:

Okay i want to provision in my cluster an iSCSI csi driver: