Closed mfitz closed 10 months ago
After further investigation, I think this is a difference in the behaviour around mounted volumes between the previous and current base Docker images, but it isn't a failure to mount the volumes as I previously suspected. Instead it looks like a difference in how each image allows you to specify paths under the volume mount point.
Using the old image, I can mount the volume at /matsim12-test-town
, but then use matsim12-test-town
with no slash when specifying paths under the mount point:
$ docker run -v /home/arup/matesto/data/matsim12-test-town:/matsim12-test-town ************.dkr.ecr.eu-west-1.amazonaws.com/genet:29993df ls -talh matsim12-test-town
total 272K
drwxr-xr-x 1 root root 4.0K Jan 11 23:54 ..
drwxrwxrwx 4 1000 1000 4.0K Nov 7 20:20 .
-rwxrwxrwx 1 1000 1000 27K Nov 7 20:20 qsim_matsim_config_test_town_12_multimodal_config.xml
-rwxrwxrwx 1 1000 1000 27K Nov 7 20:20 hermes_matsim_config_test_town_12.xml
-rwxrwxrwx 1 1000 1000 37K Nov 7 20:19 qsim_matsim_config_test_town_12_runmatsim_config.xml
drwxrwxrwx 2 1000 1000 4.0K Oct 27 10:11 elara-config
drwxrwxrwx 2 1000 1000 4.0K Dec 19 2022 bitsim
-rwxrwxrwx 1 1000 1000 4.0K Dec 19 2022 network.xml
-rwxrwxrwx 1 1000 1000 27K Dec 19 2022 test_multimodal_config.xml
-rwxrwxrwx 1 1000 1000 2.1K Dec 19 2022 all_vehicles.xml
-rwxrwxrwx 1 1000 1000 2.5K Oct 11 2021 DATASET.md
-rwxrwxrwx 1 1000 1000 2.9K Aug 25 2021 population_v12.xml
-rwxrwxrwx 1 1000 1000 1.3K Apr 30 2021 attributes.xml
-rwxrwxrwx 1 1000 1000 2.3K Apr 30 2021 facilities.xml
-rwxrwxrwx 1 1000 1000 3.9K Apr 30 2021 population.xml
-rwxrwxrwx 1 1000 1000 3.8K Apr 30 2021 population_multimodal.xml
-rwxrwxrwx 1 1000 1000 3.4K Apr 30 2021 population_multimodal_no_network_links.xml
-rwxrwxrwx 1 1000 1000 359 Apr 30 2021 road-pricing.xml
-rwxrwxrwx 1 1000 1000 26K Apr 30 2021 test_config.xml
-rwxrwxrwx 1 1000 1000 26K Apr 30 2021 test_facilities_config.xml
-rwxrwxrwx 1 1000 1000 27K Apr 30 2021 test_multimodal_config_simplified_network.xml
-rwxrwxrwx 1 1000 1000 802 Apr 30 2021 transitVehicles.xml
-rwxrwxrwx 1 1000 1000 2.3K Apr 30 2021 transitschedule.xml
If I do the same thing with the new image, I cannot see the directory:
$ docker run -v /home/arup/matesto/data/matsim12-test-town:/matsim12-test-town ************.dkr.ecr.eu-west-1.amazonaws.com/genet:latest ls -talh matsim12-test-town
ls: cannot access 'matsim12-test-town': No such file or directory
However, this is not because the volume has failed to mount. We can see it as a file system inside the container:
$ docker run -v /home/arup/matesto/data/matsim12-test-town:/matsim12-test-town ************.dkr.ecr.eu-west-1.amazonaws.com/genet:latest df -h
Filesystem Size Used Avail Use% Mounted on
overlay 62G 32G 31G 51% /
tmpfs 64M 0 64M 0% /dev
tmpfs 479M 0 479M 0% /sys/fs/cgroup
shm 64M 0 64M 0% /dev/shm
/dev/root 62G 32G 31G 51% /matsim12-test-town
tmpfs 479M 0 479M 0% /proc/acpi
tmpfs 479M 0 479M 0% /proc/scsi
tmpfs 479M 0 479M 0% /sys/firmware
Rather, we cannot see the directory because we can no longer get away with omitting the leading slash from the path. If we add the slash to the path, we're in business:
$ docker run -v /home/arup/matesto/data/matsim12-test-town:/matsim12-test-town ************.dkr.ecr.eu-west-1.amazonaws.com/genet:latest ls -talh /matsim12-test-town
total 272K
drwxr-xr-x 1 root root 4.0K Jan 11 23:54 ..
drwxrwxrwx 4 1000 1000 4.0K Nov 7 20:20 .
-rwxrwxrwx 1 1000 1000 27K Nov 7 20:20 qsim_matsim_config_test_town_12_multimodal_config.xml
-rwxrwxrwx 1 1000 1000 27K Nov 7 20:20 hermes_matsim_config_test_town_12.xml
-rwxrwxrwx 1 1000 1000 37K Nov 7 20:19 qsim_matsim_config_test_town_12_runmatsim_config.xml
drwxrwxrwx 2 1000 1000 4.0K Oct 27 10:11 elara-config
drwxrwxrwx 2 1000 1000 4.0K Dec 19 2022 bitsim
-rwxrwxrwx 1 1000 1000 4.0K Dec 19 2022 network.xml
-rwxrwxrwx 1 1000 1000 27K Dec 19 2022 test_multimodal_config.xml
-rwxrwxrwx 1 1000 1000 2.1K Dec 19 2022 all_vehicles.xml
-rwxrwxrwx 1 1000 1000 2.5K Oct 11 2021 DATASET.md
-rwxrwxrwx 1 1000 1000 2.9K Aug 25 2021 population_v12.xml
-rwxrwxrwx 1 1000 1000 1.3K Apr 30 2021 attributes.xml
-rwxrwxrwx 1 1000 1000 2.3K Apr 30 2021 facilities.xml
-rwxrwxrwx 1 1000 1000 3.9K Apr 30 2021 population.xml
-rwxrwxrwx 1 1000 1000 3.8K Apr 30 2021 population_multimodal.xml
-rwxrwxrwx 1 1000 1000 3.4K Apr 30 2021 population_multimodal_no_network_links.xml
-rwxrwxrwx 1 1000 1000 359 Apr 30 2021 road-pricing.xml
-rwxrwxrwx 1 1000 1000 26K Apr 30 2021 test_config.xml
-rwxrwxrwx 1 1000 1000 26K Apr 30 2021 test_facilities_config.xml
-rwxrwxrwx 1 1000 1000 27K Apr 30 2021 test_multimodal_config_simplified_network.xml
-rwxrwxrwx 1 1000 1000 802 Apr 30 2021 transitVehicles.xml
-rwxrwxrwx 1 1000 1000 2.3K Apr 30 2021 transitschedule.xml
I'm closing this issue because no action needs to be taken in GeNet (but I will be fixing up some pipelines in Matesto...).
The Problem
Our daily Matesto CI pipeline has been failing for the last three days. The failed step is always GeNet network simplification, which fails consistently in the same way since the most recent PR - which included some changes to
Dockerfile
- was merged.The error from the Popper-managed pipeline looks like this:
Some Investigation
Manually running network simplification via the Docker CLI
Using the previous version of GeNet's Docker image
All good.
Using the current version of GeNet's docker image
The exact same command fails, apparently because GeNet cannot read the volume mounted into the container at
matsim12-test-town
.Checking the default user inside the Docker container
It looks like changing the base image from
python:3.11.4-bullseye
tomambaorg/micromamba:1.5.3-bullseye-slim
in this PR has changed the default user inside the container.It seems likely that the problem is the failure of non-root users inside the container to successfully mount volumes on the host machine. Modifying the permissions on the directories on the host machine may be a viable workaround. It is also possible to override the default container user via the
--user
parameter to the Dockerrun
command, but the Popper library we are using in Matesto does not provide a way to do the same thing programmatically.A quick fix would probably be to change the user in the
Dockerfile
, but we will need to have a conversation about that if it was a deliberate decision to move away from the root user in the container.