NLnetLabs / nsd

The NLnet Labs Name Server Daemon (NSD) is an authoritative, RFC compliant DNS nameserver.
https://nlnetlabs.nl/nsd
BSD 3-Clause "New" or "Revised" License
444 stars 101 forks source link

Consider making reloading of configuration the default #374

Open k0ekk0ek opened 4 weeks ago

k0ekk0ek commented 4 weeks ago

What it says on the tin. See @anandb-ripencc's comment on #323.

anandb-ripencc commented 4 weeks ago

Thanks for opening this issue. Let's review what NSD currently does (version 4.10.1). When it receives a HUP signal, it only reopens the log file, and reloads existing zones (primary zones' disk files are checked for updates and reloaded, while secondary zones are refreshed by querying their primaries). Note that the "nsd" man page does not document the refreshing of secondary zones, even though this is an effect of HUP. The config file, nsd.conf is not parsed, and so it's not possible to add/remove zones, patterns, TSIG keys, or change verbosity, identity, nsid, version, tcp-count (and related options), xfrd-tcp-, zonefiles-write, rrl-, etc. Therefore, the HUP signal is rather limited.

If we use "nsd-control", then we have more options. We have "log_reopen", which is a very lightweight way to assist with log file rotation. This is not synchronous, and some nsd processes may continue to write to the old log file for some time after this command. This is not documented, and will come as a surprise to many. It's especially annoying if you use compression. Tools like logrotate will compress the old log file after rotation, and will emit an error when the original file changes while it's doing its compression. The consequence of this is that some lines at the end of the file may not make it into the compressed file, resulting in data loss. This behaviour should be documented, so that operators can make appropriate choices about how to handle rotated log file (such as delaying the compression).

Next, we have "reload". Without additional arguments, it behaves exactly like the HUP signal (reopens log file, and reloads or refreshes zones). Note that the "nsd-control" man page only talks about reloading zone files, but does not document the behaviour of refreshing secondary zones. This should be documented.

If "reload" is given an argument, then it makes nsd reload a specific zone. This reload will involve reading the zone file from disk for a primary zone, as documented. But for a secondary, it will mean attempting XFR from the primary, and this is not documented. It's also not clear whether reloading a single zone will also reopen the log file. I could test this, but in any case, this could be documented better.

Now, interestingly nsd-control also provides commands to deal exclusively with secondary zones. These are "transfer" and "force_transfer". They do what they say. They also take arguments to operate on specific zones.

I can appreciate having separate commands, but this is currently anomalous. An operator can cause refresh of all secondary zones with "transfer", or a refresh/reload of all zones with "reload". What is missing is a way to reload just the primary zones, without causing refresh of all the secondaries. As I wrote before, the documentation suggests that "reload" operates only on primary zones, but the behaviour doesn't match the documentation. My suggestion would be to change the code to limit "reload" to just primary zones. Then its behaviour will match the documentation. An operator who wants to refresh/reload all zones on a mixed server will have to issue 2 commands, and I think this is fine.

But then we have another anomaly. The "reload" command does something else that the "transfer" command doesn't do. And this is to reopen the log file. If all I want to do is reopen the log file for rotation, then "reload" is heavy. There is already the "log_reopen" command, which just reopens the log file. Since the facility is already there, it should be decoupled from the "reload" command, which should be limited to reloading only primary zones.

Next, we have the "reconfig" command. This is designed for a lightweight reconfiguration of some things in NSD. It explicitly does not reload/refresh any existing zones. But it does read nsd.conf, and picks up changes to TSIG keys, patterns and zone definitions. Newly added primary zones will be read from disk and secondary zones will be transferred from primaries. Zones removed from nsd.conf will be dropped from memory. The "reconfig" command also updates RRL related options, but this is not documented in the man page of "nsd-control". Instead, strangely, it is documented in "nsd.conf". An operator has to read the man pages of all the components, to get a clear picture of what is affected by which command. I do not think this is good.

Let's talk more about "reconfig". So it reconfigures some things about NSD, but not all. I understand that some things are not possible to reconfigure on the fly, such as listening addresses. But there are several other things in NSD which I believe can be reconfigured. Examples of such options are: identity, version, nsid, tcp-count, tcp-query-count, tcp-timeout, xfrd-tcp-max, xfrd-tcp-pipeline, verbosity, hide-version, hide-identity, refuse-any. I may be wrong about some of them, and the developers will know better what can and cannot be changed at run-time. What I would love to see is each option in "nsd.conf" tagged with a note showing whether it can be changed on the fly with a "reconfig" or requires a restart. This information helps an operator. When they are changing any config option, they are aware whether to issue a reconfig or restart the server.

Now, let's come to the recent change you've made, for versions greater than 4.10.1. You've modified the behaviour of SIGHUP to do what the "reload" and "reconfig" commands of nsd-control do, ie. parse nsd.conf, and pick up changes to keys and patterns, add/remove zones, reopen the log file, reload primary zones from disk, and refresh secondary zones from primaries. This is certainly useful for a operator with a small setup, to avoid setting up a socket for nsd-control, and instead use the traditional SIGHUP to just update all the things on a running NSD server. I think this behaviour should have been in NSD much earlier, but it is better late than never. However, I am disappointed that you've also added yet another option to control this behaviour, and defaulted it to off, whereas this is the kind of option that is very useful, and should default to on. Better yet, drop the option altogether.

Think a bit more about who would want to use SIGHUP and how. If NSD is packaged for general distribution, it should be possible to package it in a way that's consistent. If NSD has the ability to reconfigure itself on SIGHUP, then package maintainers can ship it with systemd unit files to send SIGHUP when a user types "systemctl reload nsd". The presence of an option that changes the behaviour of SIGHUP means that "systemctl reload nsd" is no longer behaving consistently. Imagine the questions on the mailing list:

User: "I reloaded NSD, but it has not picked up my configuration changes." NSD developer: "Did you enable reload-config and restart the server?"

I would be very sad if you were to maintain this option, because it is counter-intuitive to unix system operators who are used to SIGHUP generally meaning "reload configuration".

You also mentioned that maybe this option is necessary for now, and removed in a later version. Oh my god. Even more confusion. Now there are multiple versions of NSD packaged out there, and there is total confusion about SIGHUP behaviour.

This also brings me to version numbering. The world understands the MAJOR.MINOR.PATCH system of version numbering. PATCH changes should only be used for bug fixes. MINOR changes can be used for new features, but that don't break compatibility. And MAJOR should be used for breaking changes or major changes in behaviour. When the decision was taken to drop the database completely, and later introduce SIMDZONE, I had hoped to see a major new version of NSD, where deprecated options are even removed, and cause an ERROR if found in nsd.conf. Compile time options that were hacks are just replaced or removed. Alas, it is too late to release a new version now, but I hope you will consider this version numbering going forward. I have mentioned this to the developers before.

And finally, a humble request. When you're considering making important changes to NSD, such as this SIGHUP change, you should consider asking for operator feedback via the mailing list. Not all operators follow the commits to the NSD repository, and things come as a surprise. Operators can offer you real world experience and feedback in running your software, and your software will evolve better.

Please share all this feedback with all the other NSD developers, and discuss it carefully. Please also ask me and other operators for feedback before making major changes.

k0ekk0ek commented 3 weeks ago

@anandb-ripencc thank you for your response. It's a bit long for my taste, but I'll do my best to provide the proper feedback on the subject of the issue (reload-config). (I'll get to the rest eventually. Please phrase your comments concise).

You're first assumption is not entirely correct. NSD checks the zone files only if zonefiles-check (nsd.conf) is set to yes (default) on SIGHUP. Otherwise, it only rotates the log file. We want to keep the option for users to only rotate the log file, because for some operators doing the additional work is too much. Rereading will become the default, but we require a way for users to disable it (like zonefiles-check), they have specifically requested this functionality in the past.

In your opinion, what would be the proper setting(s)?

anandb-ripencc commented 3 weeks ago
k0ekk0ek commented 3 weeks ago

@anandb-ripencc, I think we're mostly in agreement(?) Actually, we (all NSD developers) largely agree with you. However, we need to have an option to allow users to use SIGHUP only for log file rotation. In our opinion, the reload-config is similar to the zonefiles-check option, but perhaps the name is poorly chosen(?) My proposal would be to have a single option to control what SIGHUP does. Something like skip-on-sighup: (not the actual name) where the user can specify either zones or config or both. The default behavior being to not skip anything. That way, most users can completely ignore the option and it's only a single entry in the example configuration/manual page.

We can choose to make more options configurable, but it requires changing large parts of NSD's internals. So while it seems simple, it's actually a lot of work. I can see users wanting to update e.g. tcp-count. Let's open a separate issue to register any options that users absolutely need to change.

I've spend quite some yesterday figuring out what nsd-control reload (or checking zone files in general) does. It does exactly as advertised, check zone files. However, it also checks zone files for secondary zones. If the file does not exist, it communicates the in-memory SOA to xfrd. At which point general logic is applied that may trigger a zone transfer. I cannot currently change the behavior as it would require quite the update to NSD's internal IPC mechanism. We can register it in a separate issue which we get to eventually, but not in the near future.

anandb-ripencc commented 3 weeks ago

Thanks for the feedback @k0ekk0ek. I think that an option like skip-on-sighup would be fine. The default should be to do all the things, and let the operator choose whether configuration and zone reload happens or not.

It's fine to open another issue to discuss and define what other NSD options should be subject to on-the-fly reconfiguration. You can invite other operators to that discussion and gather real-world experience to see what options operators usually want to change.