containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
24.03k stars 2.43k forks source link

Quadlet and "podman.container" file will break podman service #18275

Closed Nitrousoxide closed 1 year ago

Nitrousoxide commented 1 year ago

Issue Description

Creating a podman.container file in the quadlet watched directories will cause it to create its own "podman.service" systemd entry which will break podman and prevent it from starting up.

Steps to reproduce the issue

Steps to reproduce the issue

  1. create properly formatted quadlet entry in (for instance) ~/.config/containers/systemd/podman.container
  2. Restart system
  3. Podman is broken

Describe the results you received

This causes quadlet to create a service entry that replaces the correct podman.service systemd entry with its own, breaking podman entirely. I only tested this in userspace podman but I imagine it would also break rootful podman.

This is the systemd status after creating the podman.container entry and restarting.

[core@localhost ~]$ systemctl --user status podman.service
× podman.service
     Loaded: loaded (/var/home/core/.config/containers/systemd/podman.container; generated)
    Drop-In: /usr/lib/systemd/user/service.d
             └─10-timeout-abort.conf
     Active: failed (Result: exit-code) since Wed 2023-04-19 20:15:51 EDT; 55s ago
TriggeredBy: × podman.socket
    Process: 1911 ExecStart=/usr/bin/podman run --name=systemd-podman --cidfile=/run/user/501/podman.cid --replace --rm --log-driver passthrough --cgroups=spli>
    Process: 1998 ExecStopPost=/usr/bin/podman rm -f -i --cidfile=/run/user/501/podman.cid (code=exited, status=0/SUCCESS)
    Process: 2003 ExecStopPost=rm -f /run/user/501/podman.cid (code=exited, status=0/SUCCESS)
   Main PID: 1911 (code=exited, status=127)
        CPU: 212ms

Apr 19 20:15:51 localhost.localdomain podman[1911]: Error: crun: executable file `run` not found in $PATH: No such file or directory: OCI runtime attempted to >
Apr 19 20:15:51 localhost.localdomain systemd[749]: podman.service: Main process exited, code=exited, status=127/n/a
Apr 19 20:15:51 localhost.localdomain systemd[749]: podman.service: Killing process 1970 (conmon) with signal SIGKILL.
Apr 19 20:15:51 localhost.localdomain systemd[749]: podman.service: Killing process 1974 (podman) with signal SIGKILL.
Apr 19 20:15:51 localhost.localdomain systemd[749]: podman.service: Killing process 1997 (podman) with signal SIGKILL.
Apr 19 20:15:51 localhost.localdomain systemd[749]: podman.service: Failed with result 'exit-code'.
Apr 19 20:15:51 localhost.localdomain systemd[749]: Failed to start podman.service.
Apr 19 20:15:51 localhost.localdomain systemd[749]: podman.service: Start request repeated too quickly.
Apr 19 20:15:51 localhost.localdomain systemd[749]: podman.service: Failed with result 'exit-code'.
Apr 19 20:15:51 localhost.localdomain systemd[749]: Failed to start podman.service.

Describe the results you expected

I would expect quadlet to ignore .container entries for known system services. It should at least ignore podman.container, but probably other extant service entries if possible.

podman info output

If you are unable to run podman info for any reason, please provide the podman version, operating system and its version and the architecture you are running.

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

Nitrousoxide commented 1 year ago

Also, this presumably could be used to break a system boot and not just podman if a user naively names a .container file as a needed system service.

Luap99 commented 1 year ago

@ygalblum @vrothberg PTAL

It should be possible to check for existing services with that name but I guess it is not that easy as there are many directories where system services could be defined.

Also one could argue that this is a user error. You could have created a podman.service without quadlet and face the same issue.

vrothberg commented 1 year ago

Overriding existing services was already possible before when using podman generate systemd and moving the files to ~/.config/systemd/user. I just tried with a podman.service.

I see use cases on both ends: erroring out on name conflict & allowing overriding. FWIW, I do not think that's a bug.

What seems worrying is the "breaking podman entirely" part. @Nitrousoxide can you elaborate on that?

ygalblum commented 1 year ago

First, this issue is not specific to breaking podman.service. One can create a sshd.container file and break sshd.service. Now, as for looking for all services, what about generated service files whose generator runs after quadlet?

Nitrousoxide commented 1 year ago

Overriding existing services was already possible before when using podman generate systemd and moving the files to ~/.config/systemd/user. I just tried with a podman.service.

I see use cases on both ends: erroring out on name conflict & allowing overriding. FWIW, I do not think that's a bug.

What seems worrying is the "breaking podman entirely" part. @Nitrousoxide can you elaborate on that?

I guess I should clarify that it breaks the Podman socket entirely since that is what is reliant on the podman.service.

k9withabone commented 1 year ago

It should be possible to check for existing services with that name but I guess it is not that easy as there are many directories where system services could be defined.

I am planning on using systemd's dbus method ListUnitFiles() for k9withabone/podlet#14.

vrothberg commented 1 year ago

I guess I should clarify that it breaks the Podman socket entirely since that is what is reliant on the podman.service.

What do you mean by "breaks the Podman socket entirely"? If you remove the podman.container and reload the daemon, things should go back to normal, don't they?

Nitrousoxide commented 1 year ago

What do you mean by "breaks the Podman socket entirely"? If you remove the podman.container and reload the daemon, things should go back to normal, don't they?

Correct. However, as long as the conflicting .container exists it will continue to break the socket or any other system resource of a corresponding .service file to the *.container file.

Perhaps a solution might be to poll the list of service files periodically as suggested by @k9withabone with ListUnitFiles() as well as before boot/shutdown, write that a file and refuse to create services that are known to conflict unless a flag is set in the .container file indicating the conflict is intentional by the user? That would prevent accidental breakage of vital system services, but still give the user the ability to replace them with podman services if so desired.

You may need to think of some way to not have it refuse to create a new service file for an existing podman container service file though. Since I imagine that after the first reboot or systemctl daemon-reload it would see all these extant podman containers as services and think it can't go and set them up for the next boot when it should do so.

vrothberg commented 1 year ago

Thanks, @Nitrousoxide !

At the moment, I gravitate more toward highlighting in the docs that existing services can be overridden. It seems more idiomatic to how service management works today.

You may need to think of some way to not have it refuse to create a new service file for an existing podman container service file though.

I am not yet sure how. I am hesitant to change the default behavior of Quadlet and if there is an opt-in way to refuse creating services with a name-conflict, users are already aware of the Problem and can list services themselves.

Curious what @Luap99 and @ygalblum think.

ygalblum commented 1 year ago

I also think that documentation is the way to go.

Quadlet is not a special case. Users creating service files should be aware of the possibility of naming conflicts. Similarly, they might override other services or have their services overridden once there is a naming conflict also when they create their own .service files.

Regarding https://github.com/containers/podman/issues/18275#issuecomment-1517280208, this will work only for when podlet is executed. What's to say there won't be a new service with the same name created later.

Luap99 commented 1 year ago

In general this doesn't seem a problem specific to quadlet, every generator would face this problem so I took a look in the man page to see how this issue can be solved.

systemd.generator(7) describes exactly what directories the generator can use to change the behaviour.

Quadlet current used the normal-dir which means not all unit files are overwritten, only the ones ship by the "vendor". I assume that means files under /usr.

We could switch this to late-dir to fix the issue here but I agree with @vrothberg we should not change the default unless there is a strong reason for it. Also changing it to late-dir would not really solve the user problem, in this case a user would keep the original unit but then be confused why quadlet did "nothing".

rhatdan commented 1 year ago

I agree this is a docs only issue.