Closed ltalirz closed 5 years ago
celery uses os.getgid()
https://github.com/celery/celery/blob/611e63ccc4b06addd41a634903a37b420a5765aa/celery/platforms.py#L780
and indeed:
$ python
Python 2.7.15rc1 (default, Nov 12 2018, 14:31:15)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.getgid()
0
The problem seems to be that "scientist" is a member of both groups 0 and 1000
scientist@jupyter-leopold-2etalirz-40epfl-2ech:~$ id -g scientist
1000
scientist@jupyter-leopold-2etalirz-40epfl-2ech:~$ id -g
0
scientist@jupyter-leopold-2etalirz-40epfl-2ech:~$ id -G
0 1000
This is despite the following line in /etc/passwd
:
scientist:x:1000:1000::/project:/bin/bash
Some potentially useful read on how uid/gid work inside containers https://medium.com/@mccode/understanding-how-uid-and-gid-work-in-docker-containers-c37a01d01cf
This led me to suspect that the container is spawned with gid 0, which led me to the kubespawner docs
run_as_gid – The GID used to run single-user pods. The default is to run as the primary group of the user specified in the Dockerfile, if this is set to None. Setting this parameter requires that feature-gate RunAsGroup be enabled, otherwise the effective GID of the pod will be 0 (root). In addition, not setting run_as_gid once feature-gate RunAsGroup is enabled will also result in an effective GID of 0 (root).
Here the description of kubernetes featuregates - it seems, though, there is currently no way to get the state of a feature gate in kubernetes (!)
Figuring out how to set run_as_gid
is also not trivial. Reading the kubespawner code, it seems it is set from KubeSpawner.gid
https://github.com/jupyterhub/kubespawner/blob/c02c61c457e498192fdf9f240c2bcaec373f9f95/kubespawner/spawner.py#L1315
According to the docs, KubeSpawner.gid
defaults to the value of the USER
in the `Dockerfile
I.e. it's probably set correctly, which would suggest that in this cluster, the RunAsGroup
feature-gate is disabled.
Hm... wrong!
hub:
extraConfig: |
c.KubeSpawner.gid = 1000
actually solves the issue. Not clear how KubeSpawner.gid
ended up getting a wrong value before...
scientist@jupyter-leopold-2etalirz-40epfl-2ech:~$ id -G
1000
The daemon does not start on k8s using the current
develop
branch (using aiida-core 0.12.3).For some reason, celery on k8s thinks it is being asked to run as root: The daemon log file shows:
This information is incorrect - e.g.
id -g scientist
yields group id 1000, not 0. I wonder where celery gets this information from.You can make the daemon run by telling celery it's ok to run as root using
export C_FORCE_ROOT=1
but I'd rather have celery get the correct information.