systemd kills dma after forking to background and unit terminates

corecode / dma

The DragonFly Mail Agent, a small Mail Transport Agent (MTA), designed for home and office use.

Other

235 stars 51 forks source link

systemd kills dma after forking to background and unit terminates #128

Open corecode opened 1 year ago

corecode commented 1 year ago

I am using systemd timers to schedule backup runs. Mails sent after backup never arrive, presumably because:

backup software invokes mail
mail invokes dma (as sendmail)
dma daemon()izes after writing the queue file, effectively exiting dma from the view of mail
mail and subsequently backup software exit
systemd sees the backup process exit and then kills the cgroup
dma is killed, the queue file stays in place and never gets delivered

I am not sure how to generically fix this. One option would be to have a systemd unit that gets triggered on a socket, running a dma -q. If this socket is active, then dma would terminate instead of trying to deliver itself. Of course this requires installing and activating a systemd unit, but I don't see how we could do it without.

Seeking comments.

tedkotz commented 1 year ago

Maybe set KillMode to process or none in the activated unit: https://manpages.debian.org/testing/systemd/systemd.kill.5.en.html
When dma daemonizes the delivery, have it support a cgroup change
Add a configuration option for dma's delivery daemon to ignore or hold off on SIGTERM.
Use ExecStartPost= which doesn't say anything about killing anything.
Use Type= to oneshot or exec and/or ExitType=cgroup to tell systemd it should wait for the cgroup to finish. There is an argument that ExitType=cgroup should probably have been made the default when systemd started to manage everything by cgroup.
It looks like the systemd way is that any script the calls mail should request non-forking behavior. https://wiki.archlinux.org/title/systemd/Timers#MAILTO Does dma know about the fact mail got -Ssendwait? If it does it could just not daemonize on that option.

corecode commented 1 year ago

For this specific backup program, setting KillMode would be enough. However, this doesn't automatically generalize, and you'd have to set KillMode for every service that might want to send mails and quit. Not a good user experience (POLA).

Will this create a sub-cgroup or a separate one? I had assumed it would be a sub-cgroup and subject to the group kill.
We can ignore SIGTERM, but I think eventually we'll get a SIGKILL from systemd.
seems mail needs -Ssendwait not to run into the same problem. However, any sendmail usually exits the main process when the mail is "safe", i.e. flushed in a queue file (or delivered off-site). That's what dma does. Keeping the process open until the mail is delivered negates having an MTA with a queue.

Unfortunately all systemd service specific modifications don't generalize (1. & 5.) - I would like dma to work like postfix would work on a system, without having to edit all kinds of service files (and likely forgetting one).

tedkotz commented 1 year ago

I believe a process can be moved into any cgroup you have permission to access. If need be a dma subgroup with group mail access could be created at the system level /sys/fs/cgroup/dma/, similar to /var/spool/dma. I have not actually played with cgroups much. This looks promising, for example as of v2 process can nolonger be in more than 1 cgroup. https://docs.kernel.org/admin-guide/cgroup-v2.html

On the -Ssendwait workaround, what is "safe" depends on MTA design. I think the user expectation is mail will be delivered. This has a cost of negating the benefits of an mta queue. But this is a cost that the systemd.timer team chose with their tightly coupled process/cgroup lifecycle model or the fact that they have conflated system-services and timer events both as things that run. Which is kind of acknowledged in that Archlinux link about sending mail. Even if they went with more sensible default of ExitType=cgroup they still couldn't shut the unit down until the mail was actually delivered and the daemon closed so they are waiting either way.

This is all just talking thru the design options. Socket activation might be the right move. I guess create a fifo in $SPOOLDIR and write to it and see if the dma.service responds before daemonizing? Maybe just read from the socket and make sure you get a good version response.

corecode commented 1 year ago

I was thinking of using a dma.socket and activating the queue runner via systemd.