Fix notebook tests after using non-root user

ckunki commented 5 months ago

In ticket #66 the user for running jupyter was changed from root to jupyter with reduced permissions.

Running the notebook-tests later on revealed failures. https://github.com/exasol/ai-lab/actions/runs/8112297755/job/22174172457

Additionally,

the notebook-tests should be made mandatory.
trigger for push to main should also run these checks when pushing to main

ckunki commented 5 months ago

Examining permissions:

On host:

$ ls -l /var/run/docker.sock
srw-rw---- 1 root docker 0 Feb  5 13:53 /var/run/docker.sock

Mouting the docker.sock:

$ docker run -d \
  --volume /var/run/docker.sock:/var/run/docker.sock \
  exasol/ai-lab:9.9.9

Inside container:

# ls -l /var/run/docker.sock
srw-rw---- 1 root 118 0 Feb  5 13:53 /var/run/docker.sock

Socket is owned by root with groupid 118.

ckunki commented 5 months ago

Error message:

ERROR luigi-interface.PrepareDockerNetworkForTestEnvironment:task_logger_wrapper.py:25 PrepareDockerNetworkForTestEnvironment_9cbf939b25(job_id=2024_03_01_14_16_35_2_SpawnTestEnvironmentWithDockerDB, no_cache=False, environment_name=DemoDb, network_name=db_network_DemoDb, attempt=0): Error during removing container db_network_DemoDb: Error while fetching server API version: ('Connection aborted.', PermissionError(13, 'Permission denied'))

ckunki commented 5 months ago

The docker.sock is mounted only when running the Docker container. At this late point in time, Ansible has finished long ago and entry point of the Docker container has been started already. In consequence, we cannot change owner or permissions of docker.sock anymore.

ckunki commented 5 months ago

We will try to run the Docker container as root, instead, and only use user jupyter for running the jupyter server. Tasks:

[x] find out ids if user juypter and group jupyter: pwd.getpwnam(self.name).pw_uid
when calling subprocess.Popen():
- [x] Change user of current process to act as user jupyter: os.setresuid(uid, uid, uid)
- [x] set environment variable HOME to make jupyter create its files there
[x] check if file /var/run/docker.sock exists and if so, then change its owner to allow user jupyter to access it: os.chown(path, self.id, unchanged_gid)

ckunki commented 5 months ago

I verified that os.chown("/var/run/docker.sock") inside the Docker container does not affect the owner of the original file in the host's file system, that is mounted into the Docker container with option -v <host>:<in-container>.

ckunki commented 5 months ago

Results of env in Docker container for user root:

Rating	Variable	Value
✅ (1)	DEBIAN_FRONTEND	noninteractive
✅ (1)	_	/usr/bin/env
✅ (1)	HOSTNAME	1a048a416c53
✅ (1)	PWD	/
✅ (1)	LS_COLORS	...
✅ (1)	TERM	xterm
✅ (1)	SHLVL	1
:question:	PATH	/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
✅ (2)	VIRTUAL_ENV	/home/jupyter/jupyterenv
✅ (2)	NOTEBOOK_FOLDER_FINAL	/home/jupyter/notebooks
✅ (2)	NOTEBOOK_FOLDER_INITIAL	/home/jupyter/notebook-defaults
✅ (3)	HOME	/root

(1) not relevant
(2) added by create_image.py
(3) Already overridden by entrypoint.py before subprocess.Popen()

ckunki commented 5 months ago

Running entrypoint yields

os.setresgid(gid, gid, gid) PermissionError: [Errno 1] Operation not permitted

I assume after calling os.setresuid() the process potentially is no longer permitted to call os.setresgid().

Changing the order to the following helped:

os.setgroups([self.docker_group.id])
os.setresgid(gid, gid, gid)
os.setresuid(uid, uid, uid)

ckunki commented 5 months ago

It turned out that at least in the CI build the code running inside the Docker container did modify the (group) owner of the original file on the host system.

Updated plan for fixing the notebook tests:

1. Add tests

add test for docker ps with user jupyter
2nd test check socket on host unchanged with test in between

2. Fix notebook tests by enhancing `entrypoint.py` to

Do stat for docker.sock
Report error if group does not have rw permissions
If group with this gid does not exist yet then
- modify gid of group "docker", e.g. with groupmod -g <gid> docker
If gid already exists then add user jupiter to this group

ckunki commented 5 months ago

The last approach finally seemed to be successful.

ckunki commented 5 months ago

Additional integration tests for gid existing / non-existing group are tracked by ticket

252

exasol / ai-lab