daphne-eu / daphne

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Apache License 2.0
67 stars 62 forks source link

Starting daphne-dev container fails in case of group/user id collision #914

Open pdamme opened 1 week ago

pdamme commented 1 week ago

Until recently, I had no problems starting the daphne-dev container on my local system (Ubuntu 22.04.5 LTS) by executing ./containers/run-docker-example.sh without any changes to the code/scripts on main (assuming that I had previously obtained the docker image by docker pull daphneeu/daphne-dev:latest_X86-64_BASE).

However, until recently, I cannot start the container anymore. Executing ./containers/run-docker-example.sh prints:

Use this as an example to start DAPHNE docker containers. Copy and customize for the various flavors.
Add sudo to docker invocation if needed in your setup
groupadd: GID '1000' already exists
useradd: UID 1000 is not unique
pdamme ALL=(ALL:ALL) NOPASSWD:ALLchpasswd: (user pdamme) pam_chauthtok() failed, error:
Authentication token manipulation error
chpasswd: (line 1, user pdamme) password not changed

For longer running containers consider running 'unminimize' to update packages
and make the container more suitable for interactive use.

Use pdamme with password Docker!0844 for SSH login
Docker Container IP address(es):
172.17.0.2
sudo: unknown user pdamme
sudo: error initializing audit plugin sudoers_audit

Initially, I worked around this problem by using an older container version by changing "$DOCKER_IMAGE:$DOCKER_TAG" to 2e6294582afe in ./containers/run-docker-example.sh. That's not a long-term solution, since developers need the most up-to-date version of the container. For instance, 2e6294582afe still had g++-9.4, which cannot compile DAPHNE anymore, as we switched to C++20.

It seems like the reason for the above failure is the following: run-docker-example.sh passes the user's username, user id, and group id on the host to docker. When the daphne-dev container starts, it executes a copy of containers/entrypoint-interactive.sh. This scripts adds a new group and user with the provided ids. This is done to avoid problems with access permissions on files created in host directories mounted in the container.

The problem is: On my host system, my user id is 1000 and my group id is 1000. However, a group and user with these ids already exist in the container. This user is called ubuntu, as can be double-checked by cat /etc/passwd. Thus the command groupadd in entrypoint-interactive.sh fails and the container cannot be started.

I managed to work around this problem for now by:

  1. Adding the flag --entrypoint /daphne/containers/entrypoint-interactive.sh to the invocation of docker in run-docker-example.sh in order to replace the entrypoint script stored in the container by the one in my local clone.
  2. Commenting out the lines with /usr/sbin/groupadd ... and /usr/sbin/useradd ... in entrypoint-interactive.sh.
  3. Adding the line USER=ubuntu at the top of entrypoint-interactive.sh.

That's not a clean solution, but it works for now.

It's of course unfortunate if there is a clash of group/user ids between host and container. Not sure if it's possible to completely avoid that. At least, entrypoint-interactive.sh should detect that case and print some meaningful warning.