Open ttyS4 opened 5 months ago
Hi Tamas,
I don't think upgrading to 4.10 would make a difference in this case, but perhaps the 20 minutes timeout (in which NSD stays in reload mode) could be reduced by setting verifier-timeout:
value to something reasonable; like 200% the time it takes the script to verify the zone or so.
But I still want to look into the specific case (by manual code instpection) that the process already exited, but that NSD is still reading what the verifier is writing to stdout and stderr.
If you need any info from us, just let us know. (I can also try to collect data for you as long as it is considered safe.)
This issue happened again.
# grep -E 'handle_child_command|Broken' /var/log/daemon.log
Oct 22 04:10:31 myhost nsd[27260]: handle_child_command: read: Connection reset by peer
Oct 22 05:48:55 myhost nsd[16206]: svrmain: problems sending quit to child 8223 command: Broken pipe
Oct 22 05:48:55 myhost nsd[16206]: handle_child_command: read: Connection reset by peer
Oct 22 05:48:55 myhost nsd[16206]: svrmain: problems sending quit to child 8223 command: Broken pipe
Oct 22 06:11:01 myhost nsd[24647]: handle_child_command: read: Connection reset by peer
hi nsd folks,
There is a place where nsd is used for verification. (Because of ixfr related issues it is on 4.9.1-1 now running on debian 12, compiled a package in a debian12 chroot using official debian packages, basically a backport.)
A new zone is generated every 10 minutes and knot signs the zone then nsd does verification and distributes the zone (notify-out + xfr).
However today we saw no follow-up after the verifier
exited with 0
. We seensd[4819]: handle_child_command: read: Connection reset by peer
like 20 minutes after the verification finished. Then normal activity is resumed and:message follows.
Notify messages were received (and logged) while in this state, but no progress.
Would you think that upgrade to 4.10 could help? Is this a known issue or something that needs further investigation?
Regards, Tamás