aiidalab / aiidalab-docker-stack

Docker images with the basic software stack for AiiDAlab
https://aiidalab.net
Other
9 stars 14 forks source link

aiida daemon fails to start on k8s #20

Closed ltalirz closed 5 years ago

ltalirz commented 5 years ago

The daemon does not start on k8s using the current develop branch (using aiida-core 0.12.3).

For some reason, celery on k8s thinks it is being asked to run as root: The daemon log file shows:

Running a worker with superuser privileges when the
worker accepts messages serialized with pickle is a very bad idea!

If you really want to continue then you have to set the C_FORCE_ROOT
environment variable (but please think about this before you do).

User information: uid=1000 euid=1000 gid=0 egid=0

This information is incorrect - e.g. id -g scientist yields group id 1000, not 0. I wonder where celery gets this information from.

You can make the daemon run by telling celery it's ok to run as root using export C_FORCE_ROOT=1 but I'd rather have celery get the correct information.

ltalirz commented 5 years ago

celery uses os.getgid() https://github.com/celery/celery/blob/611e63ccc4b06addd41a634903a37b420a5765aa/celery/platforms.py#L780

and indeed:

$ python
Python 2.7.15rc1 (default, Nov 12 2018, 14:31:15)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.getgid()
0
ltalirz commented 5 years ago

The problem seems to be that "scientist" is a member of both groups 0 and 1000

scientist@jupyter-leopold-2etalirz-40epfl-2ech:~$ id -g scientist
1000
scientist@jupyter-leopold-2etalirz-40epfl-2ech:~$ id -g
0
scientist@jupyter-leopold-2etalirz-40epfl-2ech:~$ id -G
0 1000

This is despite the following line in /etc/passwd:

scientist:x:1000:1000::/project:/bin/bash
ltalirz commented 5 years ago

Some potentially useful read on how uid/gid work inside containers https://medium.com/@mccode/understanding-how-uid-and-gid-work-in-docker-containers-c37a01d01cf

This led me to suspect that the container is spawned with gid 0, which led me to the kubespawner docs

run_as_gid – The GID used to run single-user pods. The default is to run as the primary group of the user specified in the Dockerfile, if this is set to None. Setting this parameter requires that feature-gate RunAsGroup be enabled, otherwise the effective GID of the pod will be 0 (root). In addition, not setting run_as_gid once feature-gate RunAsGroup is enabled will also result in an effective GID of 0 (root).

Here the description of kubernetes featuregates - it seems, though, there is currently no way to get the state of a feature gate in kubernetes (!)

Figuring out how to set run_as_gid is also not trivial. Reading the kubespawner code, it seems it is set from KubeSpawner.gid https://github.com/jupyterhub/kubespawner/blob/c02c61c457e498192fdf9f240c2bcaec373f9f95/kubespawner/spawner.py#L1315

According to the docs, KubeSpawner.gid defaults to the value of the USER in the `Dockerfile

I.e. it's probably set correctly, which would suggest that in this cluster, the RunAsGroup feature-gate is disabled.

ltalirz commented 5 years ago

Hm... wrong!

hub:
  extraConfig: |
    c.KubeSpawner.gid = 1000

actually solves the issue. Not clear how KubeSpawner.gid ended up getting a wrong value before...

scientist@jupyter-leopold-2etalirz-40epfl-2ech:~$ id -G
1000