exasol / ai-lab

Development environment for data science developers
MIT License
3 stars 0 forks source link

Fix notebook tests after using non-root user #241

Closed ckunki closed 5 months ago

ckunki commented 5 months ago

In ticket #66 the user for running jupyter was changed from root to jupyter with reduced permissions.

Running the notebook-tests later on revealed failures. https://github.com/exasol/ai-lab/actions/runs/8112297755/job/22174172457

Additionally,

  1. the notebook-tests should be made mandatory.
  2. trigger for push to main should also run these checks when pushing to main
ckunki commented 5 months ago

Examining permissions:

On host:

$ ls -l /var/run/docker.sock
srw-rw---- 1 root docker 0 Feb  5 13:53 /var/run/docker.sock

Mouting the docker.sock:

$ docker run -d \
  --volume /var/run/docker.sock:/var/run/docker.sock \
  exasol/ai-lab:9.9.9

Inside container:

# ls -l /var/run/docker.sock
srw-rw---- 1 root 118 0 Feb  5 13:53 /var/run/docker.sock

Socket is owned by root with groupid 118.

ckunki commented 5 months ago

Error message:

ERROR luigi-interface.PrepareDockerNetworkForTestEnvironment:task_logger_wrapper.py:25 PrepareDockerNetworkForTestEnvironment_9cbf939b25(job_id=2024_03_01_14_16_35_2_SpawnTestEnvironmentWithDockerDB, no_cache=False, environment_name=DemoDb, network_name=db_network_DemoDb, attempt=0): Error during removing container db_network_DemoDb: Error while fetching server API version: ('Connection aborted.', PermissionError(13, 'Permission denied'))

ckunki commented 5 months ago

The docker.sock is mounted only when running the Docker container. At this late point in time, Ansible has finished long ago and entry point of the Docker container has been started already. In consequence, we cannot change owner or permissions of docker.sock anymore.

ckunki commented 5 months ago

We will try to run the Docker container as root, instead, and only use user jupyter for running the jupyter server. Tasks:

ckunki commented 5 months ago

I verified that os.chown("/var/run/docker.sock") inside the Docker container does not affect the owner of the original file in the host's file system, that is mounted into the Docker container with option -v <host>:<in-container>.

ckunki commented 5 months ago

Results of env in Docker container for user root:

Rating Variable Value
✅ (1) DEBIAN_FRONTEND noninteractive
✅ (1) _ /usr/bin/env
✅ (1) HOSTNAME 1a048a416c53
✅ (1) PWD /
✅ (1) LS_COLORS ...
✅ (1) TERM xterm
✅ (1) SHLVL 1
:question: PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
✅ (2) VIRTUAL_ENV /home/jupyter/jupyterenv
✅ (2) NOTEBOOK_FOLDER_FINAL /home/jupyter/notebooks
✅ (2) NOTEBOOK_FOLDER_INITIAL /home/jupyter/notebook-defaults
✅ (3) HOME /root
ckunki commented 5 months ago

Running entrypoint yields

os.setresgid(gid, gid, gid) PermissionError: [Errno 1] Operation not permitted

I assume after calling os.setresuid() the process potentially is no longer permitted to call os.setresgid().

Changing the order to the following helped:

os.setgroups([self.docker_group.id])
os.setresgid(gid, gid, gid)
os.setresuid(uid, uid, uid)
ckunki commented 5 months ago

It turned out that at least in the CI build the code running inside the Docker container did modify the (group) owner of the original file on the host system.

Updated plan for fixing the notebook tests:

1. Add tests

2. Fix notebook tests by enhancing entrypoint.py to

ckunki commented 5 months ago

The last approach finally seemed to be successful.

ckunki commented 5 months ago

Additional integration tests for gid existing / non-existing group are tracked by ticket