GoogleCloudPlatform / compute-image-packages

Packages for Google Compute Engine Linux images.
https://cloud.google.com/compute/docs/images
Apache License 2.0
372 stars 213 forks source link

systemctl try-reload-or-restart instead of restart in google_set_hostname #741

Closed derobert closed 7 months ago

derobert commented 5 years ago

Currently, google_set_hostname (when running under systemd only; it sends a SIGHUP when not under systemd) restarts rsyslog every DHCP renew. I suspect, but have not tested, that could cause log messages to be lost. Also, since it's "restart" instead of "try-restart", it will unexpectedly start the daemon if the sysadmin or package management system has stopped it. It also annoying fills the log with a lot of messages that rsyslog has been stopped, which would normally be an important event.

I suggest using try-reload-or-restart instead. That's the current name, though it used to be known as reload-or-try-restart, which still works (see sytemd pull request 2464).

Under the default Debian 9 setup, at least, that will still restart rsyslog. But at least that's easily fixable by the admin adding ExecReload=/bin/kill -HUP $MAINPID in an override file for rsyslog.service. Also, since it's a "try" variant, it won't unexpectedly start the daemon if the admin, package management system, etc. has stopped it.

That'd be something like this:

+++ b/dhcp/dhclient-exit-hooks.d/google_set_hostname
@@ -52,7 +52,7 @@ if [ -n "$new_host_name" ]; then
   if [ -x "$systemctl" ]; then
     hasrsyslog=$($systemctl | grep rsyslog | cut -f1 -d' ')
     if [ ! -z "$hasrsyslog" ]; then
-      $systemctl -q --no-block restart "$hasrsyslog"
+      $systemctl -q --no-block try-reload-or-restart "$hasrsyslog"
     fi
   else
     pkill -HUP syslogd

Or, alternatively, the SIGHUP path would work find under systemd too.

kkm000 commented 5 years ago

I thought along the same vein, but this won't solve the extra logging problem, because rsyslog.service, at least one that comes with Debian 9 and Ubuntu 18.04 is not written to handle a reload:

kkm@yupana:~$ sudo systemctl reload rsyslog
Failed to reload rsyslog.service: Job type reload is not applicable for unit rsyslog.service.
See system logs and 'systemctl status rsyslog.service' for details.
kkm@yupana:~$ sudo systemctl cat rsyslog
# /lib/systemd/system/rsyslog.service
[Unit]
Description=System Logging Service
Requires=syslog.socket
Documentation=man:rsyslogd(8)
Documentation=http://www.rsyslog.com/doc/

[Service]
Type=notify
ExecStart=/usr/sbin/rsyslogd -n
StandardOutput=null
Restart=on-failure

[Install]
WantedBy=multi-user.target
Alias=syslog.service

So maybe just pkill -HUP is really a better way to go, systemd or not.

What I do not understand is why syslog needs to be reloaded even when the host name has not in fact changed.

derobert commented 5 years ago

You can also use systemctl kill --kill-who main -s SIGHUP rsyslog.service under systemd to just send the HUP directly. Which is slightly better as pkill could conceivably match a process that happens to have the same name (I'm not sure if, e.g., pkill might hit syslogd inside a container).

jeremyvisser commented 5 years ago

@kkm000 I think your reply doesn’t take into account that systemctl try-reload-or-restart behaves as expected if the unit file isn’t built to handle a reload. From the systemctl(1) man page:

try-reload-or-restart PATTERN... Reload one or more units if they support it. If not, stop and then start them instead. This does nothing if the units are not running.

This means the original suggestion should work. (Sending SIGHUP is valid, too, but a better fix is to ask upstreams to add the appropriate ExecReload= line rather than working around on our end.)

kkm000 commented 4 years ago

@jeremyvisser Sorry, looks like I missed your comment back then. try-reload-or-restart will do exactly same thing as restart when the service does not support reload, and rsyslog in fact does not. And the whole point is to avoid a restart.

jeremyvisser commented 4 years ago

@kkm000 My apologies; I misunderstood your comment. Re-reading your comment, essentially you’re saying that because rsyslog.service does not contain a ExecReload= line, my suggestion would cause it to restart anyway.

Therefore, it sounds like the options are:

  1. Modify the Debian–provided rsyslog.service (i.e. drop-in unit) to provide a ExecReload=/bin/kill -HUP $MAINPID function.
  2. Do a pkill -HUP rsyslogd as you suggested before in the dhclient hook.

Given tampering with distro–managed units may lead to unexpected sysadmin surprises, the second option sounds less risky and therefore better.

kkm000 commented 4 years ago

Modify the Debian–provided rsyslog.service (i.e. drop-in unit) to provide a ExecReload=/bin/kill -HUP $MAINPID function.

That's not a bad option too; the problem as I see it is Google will have to support that separately for this distro. A distro-independent solution is always less taxing, even if not super-clean. Yeah, Google is big, but if RedHats's almost only product is RedHat, compute-image-packages is not Google's main product by any metric. When I was an SWE at Google, economy improved after the housing bubble, and our team slowly trickled down from 7 to 3--it became harder and harder to hire. So instead of pager weeks (one primary on-call, next secondary), we each had one pager-free week of three : ) And you can do only so much in a day.


Personally, I uninstall rsyslog in my custom image prep script, and using journald. It's beneficial, because it allows keeping a single journal for system and all users in a single log database file. It also supports log forwarding natively, which is a boon. And it logs to console, starting very early if enabled on kernel's command line, which avoids the problem that rsyslog has due to default permissions on /dev/console (#889). I rid of logrotate and even cron, as of Buster there are no essential cron jobs; the only active ones are a BSD calendar program and man-db maintenance script, which is a vestige of those real 80286 days: it leisurely pre-renders man pages to fixed 80 column .cat files so they can be displayed quickly on your latest and greatest VT340, but they never match the screen width any more anyway. Everything critical has been ported to systemd timers already as of Deb10. The only thing I keep is crhony, as it's less jumpy during a VM migration than systemd's NTP service (and that's only because a group of machines in my task needs to do tight enough timekeeping; I'd ignore that otherwise). But as for DHCP, DNS and interface management, systemd cuts it pretty well. Systemd is deliberately simplistic in its approach, with the idea that it should work for mostly everyone. And it's DNS resolver is a jewel: it keeps a global DNS cache per-machine, via both a libc NSS plugin and pointing resolv.conf to a proxy at 127.0.0.53:53. It's configuration is also unusually powerful, with the DNS server address per interface. I never seen other systems doing that.

RH is systemd-ized now completely or nearly so, AFAIK, and Arch started with systemd from day one. Debian is kinda at the end of the herd, and need most tweaks, but when I chose Debian 10 as my base system, Centos 7 was too aged, and Centos 8 was promised "some time next year", so I had to roll my own. But once you start messing with the distro, you'll never stop, so the image packages team is doing the right thing, IMO, by keeping the packages as distro-agnostic as possible. As long as it's within design reliability, it better be simple than pretty. It works for the absolute majority of people transparently, and for someone like me (I run computing clusters, where the startup time of a new node from an image is in the critical path), custom image tweaks are inevitable anyway.