PowerDNS / pdns

PowerDNS Authoritative, PowerDNS Recursor, dnsdist
https://www.powerdns.com/
GNU General Public License v2.0
3.7k stars 908 forks source link

pdns-3.4.0: Unit pdns.service entered failed state. #1640

Closed mortenstevens closed 10 years ago

mortenstevens commented 10 years ago

pdns-3.4.0 causes the systemd unit to fail after stopping pdns.

The reason for this is probably: https://github.com/PowerDNS/pdns/commit/5c663d99afcdd3c26d5744a47911840e16265b6d

[root@fc20 ~]# systemctl stop pdns [root@fc20 ~]# systemctl status pdns pdns.service - PowerDNS Authoritative Server Loaded: loaded (/usr/lib/systemd/system/pdns.service; enabled) Active: failed (Result: signal) since Mon 2014-07-28 15:47:28 CEST; 4s ago Process: 2725 ExecStop=/usr/bin/pdns_control quit (code=exited, status=0/SUCCESS) Process: 2705 ExecStart=/usr/sbin/pdns_server --daemon (code=exited, status=0/SUCCESS) Main PID: 2706 (code=killed, signal=KILL)

Jul 28 15:47:23 fc20.de.imt-systems.com pdns[2706]: Using 64-bits mode. Built on 20140726003537 by mockbuild@x86-0610.de.imt-sys....3-1). Jul 28 15:47:23 fc20.de.imt-systems.com pdns[2706]: PowerDNS comes with ABSOLUTELY NO WARRANTY. This is free software, and you a...ion 2. Jul 28 15:47:23 fc20.de.imt-systems.com pdns[2706]: Creating backend connection for TCP Jul 28 15:47:23 fc20.de.imt-systems.com pdns[2706]: About to create 3 backend threads for UDP Jul 28 15:47:23 fc20.de.imt-systems.com pdns[2706]: Done launching threads, ready to distribute questions Jul 28 15:47:28 fc20.de.imt-systems.com systemd[1]: Stopping PowerDNS Authoritative Server... Jul 28 15:47:28 fc20.de.imt-systems.com pdns_control[2725]: Exiting Jul 28 15:47:28 fc20.de.imt-systems.com systemd[1]: pdns.service: main process exited, code=killed, status=9/KILL Jul 28 15:47:28 fc20.de.imt-systems.com systemd[1]: Stopped PowerDNS Authoritative Server. Jul 28 15:47:28 fc20.de.imt-systems.com systemd[1]: Unit pdns.service entered failed state. Hint: Some lines were ellipsized, use -l to show in full.

Habbie commented 10 years ago

@i-maravic any thoughts on this? it looks like pdns is -9ing itself, causing a non-zero exit.

i-maravic commented 10 years ago

Of course!

What OS is this? What's the config you're using?

mortenstevens commented 10 years ago

@i-maravic This issue affects all systems running with systemd. For example, you can reproduce it with RHEL7, Fedora 19 or Fedora 20. We need a zero daemon exit code for systemd.

mind04 commented 10 years ago

this is systemd killing pdns_server direct after the 'pdns_control quit' command. pdns_server need a little more time to shutdown.

if i add a KillMode=none to the systemd service file the issue is gone.

Jul 29 15:45:42 localhost.localdomain systemd[1]: Stopping PowerDNS Authoritative Server...
Jul 29 15:45:42 localhost.localdomain pdns_control[27046]: Exiting
Jul 29 15:45:42 localhost.localdomain systemd[1]: Stopped PowerDNS Authoritative Server.
mortenstevens commented 10 years ago

@mind04 That's interesting! But it was working without adding KillMode=none to the systemd unit file before https://github.com/PowerDNS/pdns/commit/5c663d99afcdd3c26d5744a47911840e16265b6d

Does this patch delay the time to shutdown?

mind04 commented 10 years ago

This patch does not slow down the shutdown. It only prevent systemd from sending a SIGTERM to pdns_server.

Why it worked before 5c663d9? Must be timing related or an extra SIGTERM was harmless before 5c663d9

Habbie commented 10 years ago

Morten Stevens requested we dig in deeper before 3.4.0. Reopening for that.

mortenstevens commented 10 years ago

Maybe this has something to do with: http://mailman.powerdns.com/pipermail/pdns-users/2014-August/010769.html

i-maravic commented 10 years ago

@mind04 The patch doesn't prevent the SIGTERM to come to pdns-server. It actually catches it and sends a SIGKILL to GPID.

It's doing this, to avoid having zombie children, after the pdns_server dies.

i-maravic commented 10 years ago

I think that this is the issue in this case.

i-maravic commented 10 years ago

There are two possible solutions for this:

  1. Run guarded PDNS process from systemctl. When guardian receives SIGTERM it cleans up all it's children processes and exists with 0
  2. Don't cleanup the children for the standalone pdns_server on SIGTERM (pull request #1664)
mortenstevens commented 10 years ago

@i-maravic systemd doesn't need the guardian. See: https://bugzilla.redhat.com/show_bug.cgi?id=883852

i-maravic commented 10 years ago

@mortenstevens I know that the systemd doesn't need the guardian, but I don't see the reason why should we use guardian.

I think there are benefits running PDNS in the guardian mode, since in this case all the spawned children will be cleaned on any error.

We're using similar software to systemd to control PDNS in production. We're running PDNS in guardian mode.

mortenstevens commented 10 years ago

@i-maravic This patch fixes it for me: https://github.com/PowerDNS/pdns/pull/1664

@Habbie If you merge this patch, you can remove the systemd workaround KillMode=none from systemd unit file.

Habbie commented 10 years ago

I am seriously considering undoing the setpgrp feature. It is causing too much trouble (this ticket, #1671), and one of the major motivations for it (dynlistener dying) was fixed.