NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.99k stars 14.01k forks source link

Systemd confinement tries to create /usr directory and logs an error #64392

Open arianvp opened 5 years ago

arianvp commented 5 years ago

Issue description

When we enable confinement, systemd insists on creating /usr which it fails at because / is mounted read-only between it trying to create /usr and TemporaryFileSystem=/ being executed.

I have no idea why it is trying to create /usr in the first place

cc @aszlig

Steps to reproduce

    systemd.services.flopsewops = {
      wantedBy = [ "multi-user.target" ];
      confinement.enable = true;
      confinement.mode = "chroot-only";
      script = "while true; do echo hey; sleep 1; done";
      serviceConfig = {
        StateDirectory = "flopsewopos";
      };
    };
Jul 06 19:59:07 t430s systemd[1]: Started flopsewops.service.
Jul 06 19:59:07 t430s systemd[8793]: Failed to create directory at /nix/store/4pddxxlhqpphm1dq63imiqgwaqnb71rd-flopsewops-chroot/usr: Read-only file system
Jul 06 19:59:07 t430s fw8x1n0iwd2nzbmrsii4apyxxsmryli8-unit-script-flopsewops-start[8793]: hey

Technical details

Please run nix-shell -p nix-info --run "nix-info -m" and paste the results.

arianvp commented 5 years ago

~Plot thickens. As there is a test that tests exactly this usecase: https://github.com/NixOS/nixpkgs/blob/master/nixos/tests/systemd-confinement.nix#L98~

~The test succeeds, eventhough the logs seem to suggest something is wrong.
It can't create /usr, but it can cat to /usr/lib/testme/foo ?~

edit: I confused /var and /usr. the StateDirectory stuff works. It's just that systemd somehow insists on trying to create /usr within the RootDirectory

subtest: check if StateDirectory works
machine: must succeed: echo 6 > /teststep
machine: exit status 0
(0.00 seconds)
machine: must succeed: chroot-exec touch /tmp/canary
machine# [   12.102767] systemd[1]: Created slice system-test6.slice.
machine# [   12.105618] systemd[1]: Started Confined Test Service 6 (PID 1008/UID 0).
machine# [   12.116566] systemd[1010]: Failed to create directory at /nix/store/63i83mvgnmsnzn9b41hff624f3qc7pn6-test6--chroot/usr: Read-only file system
machine: exit status 0
(0.10 seconds)
machine: must succeed: chroot-exec "echo works > /var/lib/testme/foo"
machine# [   12.208273] systemd[1]: Started Confined Test Service 6 (PID 1025/UID 0).
machine# [   12.215631] systemd[1027]: Failed to create directory at /nix/store/63i83mvgnmsnzn9b41hff624f3qc7pn6-test6--chroot/usr: Read-only file system
machine: exit status 0
(0.09 seconds)
machine: must succeed: test "$(< /var/lib/testme/foo)" = works
machine: exit status 0
(0.00 seconds)
machine: must succeed: test ! -e /tmp/canary
machine: exit status 0
(0.00 seconds)
(0.19 seconds)
arianvp commented 5 years ago

Neverminddddd. I'm confusing /usr/lib and /var/lib in my head. The StateDirectory stuff does work. However, for some reason systemd insists on trying to mkdir /usr when it doesn't exist and then it fails (but the unit still succeeds, it just logs a message). Which is a totally different issue. I'll rename the issue accordingly

I also changed the toplevel description.

arianvp commented 5 years ago

. Systemd logs from which line in the systemd source code an journal message comes from! how convenient! (with journalctl --output=json)

journalctl output ```json { "__CURSOR": "s=5aa172bab0cd4cb6a604452750b70f8e;i=a9e;b=c423c5d3cbe24067b8066371104a189f;m=13b90f7a7;t=58d07ab9521ff;x=4bb0c0535b7034ac", "__REALTIME_TIMESTAMP": "1562438966518271", "__MONOTONIC_TIMESTAMP": "5294323623", "_BOOT_ID": "c423c5d3cbe24067b8066371104a189f", "_MACHINE_ID": "d8eae2e7007941a391362c92ffa738d7", "_HOSTNAME": "t430s", "SYSLOG_FACILITY": "3", "SYSLOG_IDENTIFIER": "systemd", "_UID": "0", "_GID": "0", "_SYSTEMD_SLICE": "system.slice", "_TRANSPORT": "journal", "_EXE": "/nix/store/dz4mrfbjjlzj8g9j66nmkrzvny40pzcc-systemd-239.20190219/lib/systemd/systemd", "_CAP_EFFECTIVE": "3fffffffff", "PRIORITY": "3", "CODE_FILE": "../src/shared/base-filesystem.c", "CODE_LINE": "102", "CODE_FUNC": "base_filesystem_create", "ERRNO": "30", "MESSAGE": "Failed to create directory at /nix/store/4pddxxlhqpphm1dq63imiqgwaqnb71rd-flopsewops-chroot/usr: Read-only file system", "_COMM": "(ps-start)", "_CMDLINE": "(ps-start)", "_SYSTEMD_CGROUP": "/system.slice/flopsewops.service", "_SYSTEMD_UNIT": "flopsewops.service", "_PID": "13758", "_SYSTEMD_INVOCATION_ID": "ad93bfdcb0db4b5fa914a030959fcdd0", "_SOURCE_REALTIME_TIMESTAMP": "1562438966518263" } ```

The code that causes the log message is : https://github.com/systemd/systemd/blob/c6134d3e2f1d1d17b32b6e06556cd0c5429bc78a/src/core/namespace.c#L1430-L1432

It tries to create a 'base filesystem' when RootDirectory or RootImage is set but ignores the error if it can't. However, not all of the base_filesystem paths really make sense in NixOS. Perhaps we should patch our fork of systemd so that it doesn't try to create all these 'useless' paths.

https://github.com/systemd/systemd/blob/ad2d50f84025ab1df3b05a6a28877763c17bc972/src/shared/base-filesystem.c#L30-L45

      { "bin",      0, "usr/bin\0",                  NULL },
        { "lib",      0, "usr/lib\0",                  NULL },
        { "root",  0755, NULL,                         NULL, true },
        { "sbin",     0, "usr/sbin\0",                 NULL },
        { "usr",   0755, NULL,                         NULL },
        { "var",   0755, NULL,                         NULL },
        { "etc",   0755, NULL,                         NULL },
        { "proc",  0755, NULL,                         NULL, true },
        { "sys",   0755, NULL,                         NULL, true },
        { "dev",   0755, NULL,                         NULL, true },
#if defined(__i386__) || defined(__x86_64__)
        { "lib64",    0, "usr/lib/x86_64-linux-gnu\0"
"usr/lib64\0", "ld-linux-x86-64.so.2" },
arianvp commented 5 years ago

@aszlig do you think it would make sense to have /usr/bin/env in the sandbox, just like /bin/sh?

However, when we add it, I think the above code will then instead try to make a bin directory and fail (it will try to symlink /bin to /usr/bin when /usr/bin is present) or create a var directory and fail. So it seems we should just patch this code out of systemd.

aszlig commented 5 years ago

Yeah, I noticed that quirk already and wrote about that in the commit message.

@arianvp: Yes, as you already noted, I think we should patch that in our systemd fork.

As for /usr/bin/env... I'm not in favour of this, because for most services there are no unpatched shebangs in scripts and for the rare cases where it isn't the case, one can still use BindReadOnlyPaths.

stale[bot] commented 4 years ago

Thank you for your contributions. This has been automatically marked as stale because it has had no activity for 180 days. If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity. Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse. 3. Ask on the #nixos channel on irc.freenode.net.