Open Jean-Baptiste-Lasselle opened 6 months ago
before the production part, i'll need to finish the data map https://poc-eurostat-data-transformers.pages.dev/data-pipeline/
grosse piste OPEN EBS:
déjà la vraie liste de tout les CSI drivers : https://kubernetes-csi.github.io/docs/drivers.html
ensuite :
à essayer aussi : juicefs
https://juicefs.com/docs/community/getting-started/installation
https://juicefs.com/docs/csi/introduction https://juicefs.com/docs/csi/getting_started
à tester si ej créée un filesystm juicefs sur un S3 minio est-ce que derrière mon CSI driver va marcher ds mon kubernetes ? https://juicefs.com/docs/community/getting-started/standalone#hands-on-practice-2
ouh j'ai cubeFS qui me semble pas mal aussi :
Domage, sanas faire de build spécifique arm64, v8, c'est mort cubefs
sur arm64 :
à essayer aussi : juicefs
https://juicefs.com/docs/community/getting-started/installation
https://juicefs.com/docs/csi/introduction https://juicefs.com/docs/csi/getting_started
à tester si ej créée un filesystm juicefs sur un S3 minio est-ce que derrière mon CSI driver va marcher ds mon kubernetes ? https://juicefs.com/docs/community/getting-started/standalone#hands-on-practice-2
à essayer aussi : juicefs https://juicefs.com/docs/community/getting-started/installation https://juicefs.com/docs/csi/introduction https://juicefs.com/docs/csi/getting_started à tester si ej créée un filesystm juicefs sur un S3 minio est-ce que derrière mon CSI driver va marcher ds mon kubernetes ? https://juicefs.com/docs/community/getting-started/standalone#hands-on-practice-2
AWESOME IT WORKED WITH JUICEFS AND A S3 MINIO BUCKET!!!
ok i tried juicefs
, sucessfully configure jupyterhub to use juicefs storage class, but there is a complex problem to solve : its not easy to deploy different apps using one and only juicefs storage class, and its not easy to configure jupyterhub to use a different storage class for every new spinned up user. (and if i login with 2 different users in jupyterhub it definitel crashes because the juicefs driver fails volume provisioning to the new spinned up jupyterlab)
So next step: I wil definietly try nfs csi driver, we definetly need to master our kubernetes csi stack
it workssss thanks to yandex's csi driver , just awesome, now one last easy thing:
I just need custom image to be used as notebook
it workssss thanks to yandex's csi driver , just awesome, now one last easy thing:
I just need custom image to be used as notebook
Welll I just quickly tried now to use the image i built for jupyterlab, and well it does not work, it definitely needs to be re-designed for jupyterhub, look at env vars for the default image quay.io/jupyterhub/k8s-singleuser-sample:3.3.7
:
Environment:
JPY_API_TOKEN: 980261cf6a7a42bc9c9df3e8c424ad6f
JUPYTERHUB_ACTIVITY_URL: http://hub:8081/hub/api/users/laurent/activity
JUPYTERHUB_ADMIN_ACCESS: 1
JUPYTERHUB_API_TOKEN: 980261cf6a7a42bc9c9df3e8c424ad6f
JUPYTERHUB_API_URL: http://hub:8081/hub/api
JUPYTERHUB_BASE_URL: /
JUPYTERHUB_CLIENT_ID: jupyterhub-user-laurent
JUPYTERHUB_COOKIE_HOST_PREFIX_ENABLED: 0
JUPYTERHUB_HOST:
JUPYTERHUB_OAUTH_ACCESS_SCOPES: ["access:servers!server=laurent/", "access:servers!user=laurent"]
JUPYTERHUB_OAUTH_CALLBACK_URL: /user/laurent/oauth_callback
JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES: []
JUPYTERHUB_OAUTH_SCOPES: ["access:servers!server=laurent/", "access:servers!user=laurent"]
JUPYTERHUB_SERVER_NAME:
JUPYTERHUB_SERVICE_PREFIX: /user/laurent/
JUPYTERHUB_SERVICE_URL: http://0.0.0.0:8888/user/laurent/
JUPYTERHUB_USER: laurent
JUPYTER_ALLOW_INSECURE_WRITES: true
JUPYTER_IMAGE: quay.io/jupyterhub/k8s-singleuser-sample:3.3.7
JUPYTER_IMAGE_SPEC: quay.io/jupyterhub/k8s-singleuser-sample:3.3.7
MEM_GUARANTEE: 1073741824
here is the image definition, not so complex: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/main/images/singleuser-sample/Dockerfile
ouais ok, c'est sur qu'il aura du al à monter le volume sur un répertoire qui n'existe pas ( /home/jovyan
) :
# executing 'kubectl -n decoderleco describe pod/jupyter-laurent' gives:
Mounts:
/home/jovyan from volume-laurent (rw)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
volume-laurent:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: claim-laurent
ReadOnly: false Mounts:
/home/jovyan from volume-laurent (rw)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
volume-laurent:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: claim-laurent
ReadOnly: false
Oh, there's another feature i would certainly want for my kubernetes cluster deployed app, I want date time synchronization:
/etc/timezone:/etc/timezone:ro
and /etc/localtime:/etc/localtime:ro
Note the k8tz does have ARM64 support:
ah oui purée, il monte le volume sur /home/jovyan
, et du coup ça vire tout mes executables dis-donc...:
ubuntu@DecoderLecoCadeauBOB:~$ kubectl logs pod/jupyter-pierre -n decoderleco
Defaulted container "notebook" out of: notebook, block-cloud-metadata (init)
# --- # --- # --- # --- # --- # ---
# --- # --- # --- # --- # --- # ---
# --- Cheking executables
ls: cannot access '/home/jovyan/.deno/bin/deno': No such file or directory
/run/start.debug.sh: line 15: /home/jovyan/.deno/bin/deno: No such file or directory
ls: cannot access '/home/jovyan/anaconda3/bin/conda': No such file or directory
/run/start.debug.sh: line 17: /home/jovyan/anaconda3/bin/conda: No such file or directory
ls: cannot access '/home/jovyan/.cargo/bin/cargo': No such file or directory
/run/start.debug.sh: line 19: /home/jovyan/.cargo/bin/cargo: No such file or directory
# --- # --- # --- # --- # --- # ---
# --- # --- # --- # --- # --- # ---
/home/jovyan
/run/start.debug.sh: line 29: cargo: command not found
/run/start.debug.sh: line 32: deno: command not found
/run/start.debug.sh: line 36: deno: command not found
/run/start.debug.sh: line 45: jupyter: command not found
https://z2jh.jupyter.org/en/stable/resources/reference.html#singleuser-storage-homemountpath
ah oui purée, il monte le volume sur
/home/jovyan
, et du coup ça vire tout mes executables dis-donc...:ubuntu@DecoderLecoCadeauBOB:~$ kubectl logs pod/jupyter-pierre -n decoderleco Defaulted container "notebook" out of: notebook, block-cloud-metadata (init) # --- # --- # --- # --- # --- # --- # --- # --- # --- # --- # --- # --- # --- Cheking executables ls: cannot access '/home/jovyan/.deno/bin/deno': No such file or directory /run/start.debug.sh: line 15: /home/jovyan/.deno/bin/deno: No such file or directory ls: cannot access '/home/jovyan/anaconda3/bin/conda': No such file or directory /run/start.debug.sh: line 17: /home/jovyan/anaconda3/bin/conda: No such file or directory ls: cannot access '/home/jovyan/.cargo/bin/cargo': No such file or directory /run/start.debug.sh: line 19: /home/jovyan/.cargo/bin/cargo: No such file or directory # --- # --- # --- # --- # --- # --- # --- # --- # --- # --- # --- # --- /home/jovyan /run/start.debug.sh: line 29: cargo: command not found /run/start.debug.sh: line 32: deno: command not found /run/start.debug.sh: line 36: deno: command not found /run/start.debug.sh: line 45: jupyter: command not found
https://z2jh.jupyter.org/en/stable/resources/reference.html#singleuser-storage-homemountpath
ok en changeant le mount path tout est ok, mais j'ai un nouveau problème:
/
au lieu de /user/pierre
(cf. logs ci-dessous)jupyterlab
dans le conteneur doit être dynamique, je fais ça comment....?[I 2024-05-18 23:41:57.504 ServerApp] Writing Jupyter server cookie secret to /home/jovyan/.local/share/jupyter/runtime/jupyter_cookie_secret
[I 2024-05-18 23:41:57.723 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-05-18 23:41:57.757 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-05-18 23:41:57.759 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-05-18 23:41:57.759 ServerApp] jupyter_server_mathjax | extension was successfully loaded.
[I 2024-05-18 23:41:57.760 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-05-18 23:41:57.768 LabApp] JupyterLab extension loaded from /home/jovyan/.local/lib/python3.11/site-packages/jupyterlab
[I 2024-05-18 23:41:57.768 LabApp] JupyterLab application directory is /home/jovyan/.local/share/jupyter/lab
[I 2024-05-18 23:41:57.768 LabApp] Extension Manager is 'pypi'.
[I 2024-05-18 23:41:57.804 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-05-18 23:41:57.807 ServerApp] jupyterlab_git | extension was successfully loaded.
[I 2024-05-18 23:41:57.860 ServerApp] nbdime | extension was successfully loaded.
[I 2024-05-18 23:41:57.863 ServerApp] notebook | extension was successfully loaded.
[I 2024-05-18 23:41:57.864 ServerApp] Serving notebooks from local directory: /home/jovyan
[I 2024-05-18 23:41:57.864 ServerApp] Jupyter Server 2.14.0 is running at:
[I 2024-05-18 23:41:57.864 ServerApp] http://jupyter-pierre:8888/lab?token=...
[I 2024-05-18 23:41:57.864 ServerApp] http://127.0.0.1:8888/lab?token=...
[I 2024-05-18 23:41:57.864 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 2024-05-18 23:41:57.873 ServerApp] No web browser found: Error('could not locate runnable browser').
[I 2024-05-18 23:41:58.165 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
[I 2024-05-18 23:41:58.915 ServerApp] 302 GET /user/pierre/ (@10.244.2.36) 0.62ms
[I 2024-05-18 23:41:59.211 ServerApp] 302 GET /user/pierre/ (@10.244.3.25) 0.54ms
[W 2024-05-18 23:41:59.280 ServerApp] Clearing invalid/expired login cookie username-localhost-8888
[W 2024-05-18 23:41:59.301 ServerApp] 404 GET /user/pierre (@10.244.3.25) 21.51ms referer=http://localhost:8888/hub/spawn-pending/pierre
[W 2024-05-18 23:42:39.571 ServerApp] 404 GET /user/pierre/lab (@10.244.3.25) 2.43ms referer=None
[I 2024-05-18 23:42:50.062 ServerApp] 302 GET /user/pierre/?lab (@10.244.3.25) 0.59ms
[W 2024-05-18 23:42:50.139 ServerApp] 404 GET /user/pierre?lab (@10.244.3.25) 1.52ms referer=None
[I 2024-05-18 23:43:02.997 ServerApp] 302 GET /user/pierre/ (@10.244.3.25) 0.54ms
[W 2024-05-18 23:43:03.279 ServerApp] 404 GET /user/pierre (@10.244.3.25) 1.28ms referer=None
[I 2024-05-18 23:43:09.416 ServerApp] 302 GET /user/pierre/ (@10.244.3.25) 0.47ms
[W 2024-05-18 23:43:09.568 ServerApp] 404 GET /user/pierre (@10.244.3.25) 1.25ms referer=None
[I 2024-05-18 23:43:09.798 ServerApp] 302 GET /user/pierre/ (@10.244.3.25) 0.47ms
[W 2024-05-18 23:43:09.955 ServerApp] 404 GET /user/pierre (@10.244.3.25) 1.18ms referer=None
[I 2024-05-18 23:43:52.963 ServerApp] 302 GET /user/pierre/lab/ (@10.244.3.25) 0.53ms
[W 2024-05-18 23:43:53.099 ServerApp] 404 GET /user/pierre/lab (@10.244.3.25) 1.23ms referer=None
[W 2024-05-18 23:44:13.598 ServerApp] 404 GET /user/pierre?redirects=1 (@10.244.3.25) 1.24ms referer=None
[W 2024-05-18 23:44:27.197 ServerApp] 404 GET /user/pierre?lab=1 (@10.244.3.25) 1.20ms referer=None
ok je crois que j'ai trouvé :) :
--NotebookApp.base_url="${JUPYTERHUB_SERVICE_PREFIX}"
j'ai fait de toutes petites modif pour supprimer l'authentification interne au jupyterlab, pour la laisser déléguée au jupyterhub, et c'est bon ça marche:
ah ce n'était pas une bonne idée de désactiver l'authentification
ohh j'ai compris!!! :
cmd: null
${LA CMD DEFINIE PAR JUPYTERHUB PAR DEFAUT} --unstable=true --notebook-dir="$NOTEBOOKS_DIR"
voilàààà
ok found where to add this c.Spawner.args
:
et je dois donc faire un generate config avec la commande jupyter lab
ça va me le générer dans le répertoire ~/.jupyter/jupyter_config.py
un truc du genre à vérif mais c'est ça c'est sûr
ah oui mais non en fait c'est de la config du jupyterhub lui pas dans mon image docker, c'est la commande jupyterhub
https://jupyterhub.readthedocs.io/en/0.7.2/getting-started.html#technical-overview
I am searching how to integrate :
There I want to note :
~$ docker exec -it d5272d1e503a bash -c 'jupyter lab --help-all | grep auth'
The full path to a certificate authority certificate for SSL/TLS client
authentication.
--ServerApp.allow_unauthenticated_access=<Bool>
Allow unauthenticated access to endpoints without authentication rule.
in the future), any request to an endpoint without an authentication rule
(either `@tornado.web.authenticated`, or `@allow_unauthenticated`)
excluding the endpoints marked with `@allow_unauthenticated` decorator.
prevent unauthenticated access to endpoints without `@allow_unauthenticated`.
--ServerApp.authenticate_prometheus=<Bool>
Require authentication to access prometheus metrics.
--ServerApp.authorizer_class=<Type>
The authorizer class to use.
Default: 'jupyter_server.auth.authorizer.AllowAllAuthorizer'
The full path to a certificate authority certificate for SSL/TLS client
authentication.
the actual connection URL. If authentication token is enabled, the
- authenticate with a token
completely without authentication.
These services can disable all authentication and security checks,
Default: 'jupyter_server.auth.identity.PasswordIdentityProvider'
Default: 'jupyter_server.auth.login.LegacyLoginHandler'
Default: 'jupyter_server.auth.logout.LogoutHandler'
prevented the authentication token used to launch the browser from being visible.
So there one idea might be :
jupyter_server.auth.identity.PasswordIdentityProvider
jupyter_server.auth.login.LegacyLoginHandler
jupyter_server.auth.logout.LogoutHandler
And with those re-definitions, I do so that the Authhentication flow handled by JupyterHub is compatible with the authentication endpoint of my server in my docker image.
I still have the feeling that there is osmething wrong about all this:
jupyter lab --unstable=true --ip=0.0.0.0 --NotebookApp.base_url="${JUPYTERHUB_SERVICE_PREFIX}" --NotebookApp.token='' --NotebookApp.password='' --notebook-dir="$NOTEBOOKS_DIR"
(the --NotebookApp.token='' --NotebookApp.password=''
options are the one disabling authentication)Ok one useful thing I could do here, is to use multiple profiles for multiple images :
the last issue i have is to be able to set ownership and permissions of the mounted volumes:
fsGroupPolicy
is set to File
in the surce code of the csi driverconfig.yml
that is wrongcephFS
), see https://rook.io/docs/rook/latest-release/Getting-Started/quickstart/ FSGroupPolicy
feature, https://rook.io/docs/rook/latest-release/Helm-Charts/operator-chart/?h=fsgroup#configuration, see csi.cephFSFSGroupPolicy
the volumes are mounted with root ownship, which is bothering, because for example, some jupyterlab extension can't work, like the jupyter-git
extension.
the last issue i have is to be able to set ownership and permissions of the mounted volumes:
- i think that yandex's ci driver does support that feature, because
fsGroupPolicy
is set toFile
in the surce code of the csi driver- and that it is the jupyterhub helm chart which has a problem with that: the only other explanation i can think of for now, is that the kubernetes version of my cluster is not recent enough to support the feature, or that my configuration
config.yml
that is wrong- One thing which might be worth trying : rook csi driver (based on
cephFS
), see https://rook.io/docs/rook/latest-release/Getting-Started/quickstart/- rook does document the
FSGroupPolicy
feature, https://rook.io/docs/rook/latest-release/Helm-Charts/operator-chart/?h=fsgroup#configuration, seecsi.cephFSFSGroupPolicy
the volumes are mounted with root ownship, which is bothering, because for example, some jupyterlab extension can't work, like the
jupyter-git
extension.
ok i found more about this issue :
s3fs
instead of GeeseFS
: how do i do that ?other interesting things:
--allow-root
arg to the notebook) ? Will it be a problem for the jupyterhub ? I'll have then to mount my volume anywhere i want actually like on /opt/jupyterlab/workspace , for example, it's only then that the jupyter extensions will work well, yet,it would be a security issue, wouldn't it...?Finally a good POSIX-ish FUSE s3 filesystem written in go
: This is why here I don't think i will go much further than where i got, with this issue except by using either a root user inside my containers (but then i'll have to harden security a lot for production), or I haev to switch to a non s3 CSI driver . In case i run with root user inside contianer, i would at least try and provision the whole stack by running containers with podman and without running the containers runtime daemon as root (is it possible to run as rot inside container if container runtime is not executed as root ? Im not sure there)pour passer sur gcp ça peut se faire pour 20 euros par mois et la machine n'est pas mal ( 4 vCPU 16 GB de RAM et 60 GB de disque ) :
apparrement il faudrait faire cela avec une adresse IP ephemere, ce qui aurait un coût nul,
exemples rapides :
et pour en gros 25 euros par mois, avec 20 gb de disque boot et un autre disque de 80 gb , vm 8 vCPUs / 32 gb ram:
pour info pour une éventuelle prod, prix du stockage :
the last issue i have is to be able to set ownership and permissions of the mounted volumes:
- i think that yandex's ci driver does support that feature, because
fsGroupPolicy
is set toFile
in the surce code of the csi driver- and that it is the jupyterhub helm chart which has a problem with that: the only other explanation i can think of for now, is that the kubernetes version of my cluster is not recent enough to support the feature, or that my configuration
config.yml
that is wrong- One thing which might be worth trying : rook csi driver (based on
cephFS
), see https://rook.io/docs/rook/latest-release/Getting-Started/quickstart/- rook does document the
FSGroupPolicy
feature, https://rook.io/docs/rook/latest-release/Helm-Charts/operator-chart/?h=fsgroup#configuration, seecsi.cephFSFSGroupPolicy
the volumes are mounted with root ownship, which is bothering, because for example, some jupyterlab extension can't work, like the
jupyter-git
extension.ok i found more about this issue :
- a totally similar issue on yandex github repo: securityContext not working yandex-cloud/k8s-csi-s3#79
- according https://github.com/yandex-cloud/geesefs/?tab=readme-ov-file#posix-compatibility-matrix it seems therefore that I might have a fix, if i setup the csi driver with
s3fs
instead ofGeeseFS
: how do i do that ?other interesting things:
- Can't change directory permissions or ownership via s3fs minio/minio#6496
Oh Ok, i have real answers about my issue with the volume mounts there :
- s3fs and rclone support yandex-cloud/k8s-csi-s3#16
- others than GeeseFS have major issues like being slow, being hard to setup, etc
- now i remember it worked like a charm real well in one case for me: when i used root user to run my jupyter. So can i modify my docker image to run with root user (in which case i'll have to add the
--allow-root
arg to the notebook) ? Will it be a problem for the jupyterhub ? I'll have then to mount my volume anywhere i want actually like on /opt/jupyterlab/workspace , for example, it's only then that the jupyter extensions will work well, yet,it would be a security issue, wouldn't it...?- what can i do to solve the issue well? GeeseFS only works well with Yandex s3, so does that mean that if i cahnge the s3 backend (minio), for anothe, i might solve my issue..? not sure at all...
- Oh, actually it is Yandex themselves who implemented GeeseFS and the subtitle of the repo makes it clear they had a hard time with other s3 filesystem to have good POSIX compatibility... :
Finally a good POSIX-ish FUSE s3 filesystem written in go
: This is why here I don't think i will go much further than where i got, with this issue except by using either a root user inside my containers (but then i'll have to harden security a lot for production), or I haev to switch to a non s3 CSI driver . In case i run with root user inside contianer, i would at least try and provision the whole stack by running containers with podman and without running the containers runtime daemon as root (is it possible to run as rot inside container if container runtime is not executed as root ? Im not sure there)- my conclusion is i definitely will have to find full posix support csi driver not based on s3: would NFS csi drivers do ? i don't like the idea of using NFS, so perhaps ceph with block storage and not s3. Yet it still is a problem this issue about having s3 buckets monted necessarily as root in pod containers... )
https://openebs.io/docs/quickstart-guide/installation
yes definitely OpenEBS it's the best i found there for now, i give a quick try and:
this was the first thing i thought of about block storage csi driver:
Okay i want to provision in my cluster an iSCSI csi driver: