Open lingfish opened 7 months ago
It's not as simple as that. In some cases rec is used as the system resolver by the machine it is running on, in other cases just a service by other machines. Both use-cases are valid and need different unit files.
Sorry, I don't see the difference. By using Before
, it makes rec a predicate before the (system standard) nss-lookup.target
is finally reached. If rec is a local resolver, or one for a network (such as in my case), either way this will ensure it starts and is up after the network, and before anything else depending on nss-related stuff.
You log lines do suggest your rec is (also?) use as a local resolver, so I'm officially confused now. I'll let somebody who has more knowledge wrt systemd answer this.
Indeed I do, and so again, super important for rec to start before that target is reached.
Here's a little more from systemd.special(7):
nss-lookup.target
A target that should be used as synchronization point for all
host/network name service lookups. Note that this is
independent of UNIX user/group name lookups for which
nss-user-lookup.target should be used. All services for which
the availability of full host/network name resolution is
essential should be ordered after this target, but not pull
it in.
As @omoerbeek mentioned, it's not possible to provide a single service file that meets everyone's needs. At present, pdns-recursor has no mechanism to ignore DNSSEC time validity checks, and so if your clock is too far off, DNSSEC fails to validate for basic things like the root zone or the TLD zones, and you can't resolve any names. To avoid this, an After=time-sync.target
was added in #12248 so that users could set up something like systemd-time-wait-sync.service
or similar to ensure time is synced before recursor starts. However, this created the following ordering loop (#13115):
pdns-recursor.service
-> time-sync.target
-> systemd-time-wait-sync.service
(or similar) -> ntp.service
(or similar) -> nss-lookup.target
-> pdns-recursor.service
As time sync is critically important to DNSSEC, and it is varied whether pdns-recursor on the system is used as the system's recursor, it was decided to remove the Wants=nss-lookup.target
and Before=nss-lookup.target
to break the loop (#13210).
If your system has a reliable RTC, or another mechanism to set a reasonably close to accurate time (within an hour, preferably better) during startup that doesn't rely on DNS, then you can utilize systemd's drop-in mechanism to change the dependencies of pdns-recursor.service
to remove the After=time-sync.target
and add back the Before=nss-lookup.target
and Wants=nss-lookup.target
items. If you do not have a way to get reasonably close to accurate time during startup that doesn't rely on DNS, you could still make this change, but then you may run into the issue where DNSSEC fails to validate due to time being too far off, which may make it impossible for your NTP client to start until you've manually corrected the time; you'll have to decide if you're willing to accept that risk. Only you know your system, so only you can make this decision.
Thanks for this breakdown, and I see your points.
Some interesting observations from me:
ntpsec.service
doesn't pin itself to time-sync.target
, nor time-set.target
, yet one might think it would. Reading systemd.special(7)
again, it is specific that a service should only reach this target if the time is set, which of course, NTP for example may not have immediately after start, so I suspect that's why. A oneshot sync by ntpsec
, and then going into polling mode would satisfy... but, on my system that has rec
and ntpsec
installed, I don't see either target:hostname [11:43 AM] [j:0] /etc/ntpsec # systemctl -a | grep -E 'time |ntp'
ntpsec-systemd-netif.path loaded active waiting ntpsec-systemd-netif.path
initrd-parse-etc.service loaded inactive dead Mountpoints Configured in the Real Root
ntpsec-rotate-stats.service loaded inactive dead Rotate ntpd stats
ntpsec-systemd-netif.service loaded inactive dead ntpsec-systemd-netif.service
ntpsec.service loaded active running Network Time Service
user-runtime-dir@1000.service loaded active exited User Runtime Directory /run/user/1000
ntpsec-rotate-stats.timer loaded active waiting Rotate ntpd stats daily
Services where accurate time is essential should be ordered after this unit, but not pull it in.
You'd assume that would mean "well, NTP will sort that", but it won't.
This target provides stricter clock accuracy guarantees than time-set.target (see above), but likely requires network communication and thus introduces unpredictable delays. Services that require clock accuracy and where network communication delays are acceptable should use this target. Services that require a less accurate clock, and only approximate and roughly monotonic clock behaviour should use time-set.target instead.
Based on the above, and your statement "within an hour, preferably better", perhaps rec
could use time-set.target
instead?
Either way, I couldn't find a discussion around this in the doco. Considering the impact it just had on my boot (tunnels not coming up, time not coming up etc), perhaps it needs to be documented?
Short description
Due to #13210, recursor starts after the
nss-lookup.target
, and this then breaks other things likentpd
and Wireguard, as is shown here:Environment
Steps to reproduce
Reproducible by having the above version installed.
Expected behaviour
Recursor should be
Before
nss-lookup.target
so that other units waiting on that target work.Actual behaviour
See above.
Other information
I believe the unit needs
After
,Wants
, andBefore
, as per Debian's unit file for ISC bind.I'm no expert on systemd unit dependency stuff, but I'm inclined to trust them.
See also this discussion that makes things clearer.