NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.19k stars 14.2k forks source link

nixos/virtualisation.oci-containers: --cidfile flag breaks podman containers running in rootless mode. #207050

Open busti opened 1 year ago

busti commented 1 year ago

Describe the bug

A recent pull request #177406 added the flag --cidfile=/run/podman-${escapedName}.ctr-id to the systemd service generated by the config option defined virtualisation.oci-containers.containers.<name> when running podman. https://github.com/NixOS/nixpkgs/blob/nixos-unstable/nixos/modules/virtualisation/oci-containers.nix#L267

However, when running the container in user mode and not as root, by modifying the generated systemd service after the fact, the user running it is not allowed to write that cidfile since /run is owned by root.

This kind of defeats the purpose of using podman since it explicitly exists to run rootless containers.

I am not certain how to best approach this, since I don't know where the cidfile should be written in user mode.
However, I'd be happy to contribute.

Steps To Reproduce

  1. Set the container engine to podman:
    virtualisation = {
     podman.enable = true;
     oci-containers.backend = "podman";
    };
  2. Setup any container using virtualisation.oci-containers.containers.foobar = { ??? }
  3. Create a user to run our container:
    users.users.foobar = {
      isNormalUser = true;
      extraGroups = [ "oci" ];
    };
  4. Modify the systemd service generated by the above option to run as our user:
    systemd.services.podman-foobar = {
     wantedBy = [ "default.target" ];
     after = [ "network.target" ];
     description = "foobar pod";
     serviceConfig = {
       User = "foobar";
       WorkingDirectory = "/home/foobar";
     };
    };

Expected behavior

The container can be started and stopped by the systemd service.

Actual behavior

The container fails to start since it cannot write to /run, permission denied.

Notify maintainers

@davidkna - Added the flag to the configuration @flokli - Merged the commit

davidkna commented 1 year ago

I think --cidfile=%t/%n.ctr-id might work, assuming systemd expands it to something sensible for rootless containers. This would require moving to ExecStart because it's a systemd variable. podman generate systemd --new uses it for both in case of root and rootless containers.

ca5ua1 commented 1 year ago

Same here for me, as new to NixOS it is really confusing why containers don't work and give: (code=exited, status=125)

flokli commented 1 year ago

cc @adisbladis

busti commented 1 year ago

I have done some more digging on this, it seems more difficult than I initially thought.
From what I know, %t cannot simply be used with system services since it always evaluates to /run unless $XDG_RUNTIME_DIR is set, which it never is when running a systemd service on system level, even when a serviceConfig.User is set.
The variable is usually provided by pam_systemd which does not apply here.

XDG_RUNTIME_DIR usually evaluates to /var/run/$(id -u), but we cannot simply set it in the Environment section of the service config since that does not evaluate commands.

We cannot run it with DynamicUser=yes either, since it prevents uid mappings, which causes chown to fail inside the container, see https://github.com/containers/podman/issues/12424

We could run it as a proper user service, with lingering, but that falls outside of the domain of nixos and should be done by home manager.

I am currently digging through this follow-up issue https://github.com/containers/podman/issues/12778 to see if I can get it to run, but it will require some more drastic changes to the nixos module than just adding a script arg for %t.

busti commented 1 year ago

I have gotten to a point where systemd is able to set-up the cidfile permissions correctly, but now podman fails to generate a uidmap. I don't want to let the work I put in go to waste, but I cannot be bothered to continue. This seems to be beyond what I am able to do. If anyone want's to have a look at my progress, here is the branch I was experimenting on: https://github.com/busti/nixpkgs/tree/oci-cid-experimentation

ca5ua1 commented 1 year ago

Off-topic: _@busti Dedicate more time for rest. You may also try Pomodoro Technique ._

busti commented 1 year ago

@Casul51 Good advice. Unfortunately NixOS has become a major time-sink in my life. On one hand I love the way it allows me to describe an OS config, on the other, using options differently from the way they were intended to usually requires a huge time investment. I would love to be able to just be a user of nixos, but it kinda forces you to become a dev all the time.

I am seriously considering dropping it and switching over to ansible, however, doing that would be way less satisfying.

ca5ua1 commented 1 year ago

@busti Understandable. Packaging binary is a nightmare.

In theory it sounds easy but when you encounter that auto dependency resolver start to build entire java WebViewer (or whatever was it) for no obvious reason on your poor 2 core laptop... is kinda brutal. Or that you can't just build package with nix-build if default options for package isn't provided... Or manual buildng .nix of nixpkgs from repository gives you errors... Or...

CyborgPotato commented 1 year ago

I have done some more digging on this, it seems more difficult than I initially thought. From what I know, %t cannot simply be used with system services since it always evaluates to /run unless $XDG_RUNTIME_DIR is set, which it never is when running a systemd service on system level, even when a serviceConfig.User is set. The variable is usually provided by pam_systemd which does not apply here.

XDG_RUNTIME_DIR usually evaluates to /var/run/$(id -u), but we cannot simply set it in the Environment section of the service config since that does not evaluate commands.

We cannot run it with DynamicUser=yes either, since it prevents uid mappings, which causes chown to fail inside the container, see containers/podman#12424

We could run it as a proper user service, with lingering, but that falls outside of the domain of nixos and should be done by home manager.

I am currently digging through this follow-up issue containers/podman#12778 to see if I can get it to run, but it will require some more drastic changes to the nixos module than just adding a script arg for %t.

Would it be feasible to have a script that evaluates the XDG_RUNTIME_DIR for preStart, script, preStop, etc.?

For my, in progress pod module I have something like this:

runDirSetup = ''                                                                                                        
    if [ -z ''${XDG_RUNTIME_DIR} ]; then                                                                                  
        export XDG_RUNTIME_DIR="/run/";                                                                                   
    fi                                                                                                                    
  '';
...
script = runDirSetup + (concatStringsSep " \\\n  " ([                                                                 
      "exec ${cfg.backend} pod create"                                                                                    
      "--name=${escapedName}"                                                                                             
      "--pod-id-file=${podPID}"                                                                                           
      "--replace"                                                                                                         
    ] ++ map escapeShellArg pod.extraOptions                                                                              
    ++ (if pod.enableInfra then ([                                                                                        
      "--infra-conmon-pidfile=$XDG_RUNTIME_DIR/${escapedName}-infra.pid"                                                  
      (if (pod.infraImage == "") then "" else "--infra-image=${pod.infraImage}")                                          
      "--infra-name=${escapedName}-infra"                                                                                 
    ] ++ (map (p: "-p ${p}") pod.ports)                                                                                   
    ++ (map (v: "-v ${v}") pod.volumes)) else [                                                                           
      ""                                                                                                                  
    ])                                                                                                                    
    ++ ["--infra=${lib.trivial.boolToString pod.enableInfra}"]                                                            
    ));

Which I use in tandom with RunTimeDirectory so that the end path ends up as: /var/run/$RuntimeDirectory/

I will test this soon, just a matter of finding time. Does this seem reasonable?

Related issue was useful https://github.com/containers/podman/issues/12778

ppenguin commented 1 year ago

We could run it as a proper user service, with lingering, but that falls outside of the domain of nixos and should be done by home manager.

@busti

I'm not sure about that. I'm basically following the premise that "HM is mainly for managing interactive user accounts", so it could be considered bloat/overkill for e.g. system users on (non-interactive) servers. Because of that, I'm successfully using this "extension" to the users module to run user services on nixos servers.

In this context, I am revisiting rootless podman containers again, after initially settling for a somewhat hackish rootful method mostly similar to how it's done here. This is obviously somewhat unsatisfactory since it's both rootful and basically hard-coding "pod functionality" to allow inter-container networking without (having to use) host networking.

I'm not sure how nixos modules have evolved since then on this front, but from @DavidCromp 's statement I guess we still don't have a functional convenient module to replace the (rootful) virtualisation.oci-containers module. So with this being the case it might be worthwile to settle for non-dynamic, lingering system users running rootless podman containers from systemd --user services.

busti commented 1 year ago

@ppenguin unfortunately, proper user services in nixos have been an open issue since 2017 https://github.com/NixOS/nixpkgs/issues/26128

Having proper user services without home manager would be nice though.
Also since you mention cross-container networking, I personally believe that it would be much easier to simply start a podman compose script from a proper user service than trying to do everything from within nixos. Lot's of times I find that compose configs are provided by the maintainers of an open source project themselves and I personally do not want to have to convert that to some .nix expression.

I have personally switched to doing that from within a home-manager config which I now have as a part of my server config.

There also is https://github.com/hercules-ci/arion but I don't believe it is worth it to convert all the random docker-compose.yml files I come across to .nix only to have them change in the future.

ppenguin commented 1 year ago

I have personally switched to doing that from within a home-manager config which I now have as a part of my server config.

Better late than never :wink:

That is actually not such a bad idea I guess, since if HM is used as a nixos module it could be considered little more (in terms of bloat) than an extra wrapper on top of users, though it's still a parallel universe to "pure" nixos

As far as using a service that just wraps podman-compose that's actually also what I ended up doing for e.g. a VPS install of jitsi-meet, since a painless production install through the nixos module is still quite far away (it's currently too rudimentary for that). Maybe even a very thin generic nix wrapper around the compose.yaml (optionally) and the (user or rootful) service could go a long way for good UX? (meaning minimal boiler plate). For good measure one could throw in some envsubst or convenient handling of env files possibly from /run/secrets (sops-nix) etc. to make life easier.

aksiksi commented 1 year ago

@ppenguin @busti I am working on a tool to automatically convert Compose to OCI containers in Nix. Still early but the basics work. Let me know if you have any thoughts on what features you think would be useful.

https://github.com/aksiksi/compose2nix

Pheoxy commented 1 month ago

Couldn't we just have something simple like if rootless do /run/user/$UID/podman-<service>.ctr-id instead of /run/podman-<service>.ctr-id?