Closed regbinkk closed 1 month ago
Hi @regbinkk. Thanks for reporting!
I suspect the values for the CPUID
instruction return the wrong values causing the parser to wrongfully conclude that e.g. AVX2 is supported, which in turn would explain the Illegal instruction warning. Whether that is actually the case is what I would like to find out.
The case Jaap described was running FreeBSD in a RHEL6 hypervisor that returned the "wrong" values. Anything special about your setup?
Can you run the cpuid (sudo pkg install cpuid
) and send me the output? (You can use my (Jeroen Koekkoek) adress listed on https://nlnetlabs.nl/people/ if you don't want to post publicly).
Lastly, to work around the issue (until I add a configuration option), you can set the ZONE_KERNEL
environment variable from which NSD is started to fallback
to use the non-optimized version. That should always work. Please test with nsd-checkzone
.
Thanks again for reporting!
Hi and thanks for answering.
The case Jaap described was running FreeBSD in a RHEL6 hypervisor that returned the "wrong" values. Anything special about your setup?
Nothing really special - FreeBSD on a hardware machine. The nsd service is running in a jail but to rule problems with that out I mirrored the setup outside of the jail with the very same results.
Can you run the cpuid (
sudo pkg install cpuid
) and send me the output? (You can use my (Jeroen Koekkoek) adress listed on https://nlnetlabs.nl/people/ if you don't want to post publicly).
Attaching the cpuid info as a file to avoid the wall of text - cpuid.txt.
Lastly, to work around the issue (until I add a configuration option), you can set the
ZONE_KERNEL
environment variable from which NSD is started tofallback
to use the non-optimized version. That should always work. Please test withnsd-checkzone
.
That worked just fine. The manual check with nsd-checkzone
returns 'ok' and the service starts and seemingly runs again normally.
Thanks again for reporting!
You're most welcome. Thanks again for your time!
@regbinkk, can you try and compile https://github.com/NLnetLabs/simdzone standalone using CMake and run the zone-bench
binary on an example zone file? None of the AVX2 bits are set, so I suspect the "Westmere" kernel is used and the pclmulqdq instruction is not available as ecx bit 1 for eax=0x1 is not set (see this). Judging by Intel's page on your CPU and the wikipedia pages on Lynnfield and Westmere, the illegal instruction then occurs in the scanner. Your in the sweet spot of having SSE4.2, but not CLMUL.
The first time around the binary should crash. Then changing this line to read SSE42|PCLMULQDQ
rather than just SSE42
and recompiling should make the code select the correct parser kernel.
@k0ekk0ek hi, following the instructions for compiling in release mode ('cd zone-parser'??!?) and parsing the example zone, those are the test results I get:
$ ./zone-bench parse ../../example.com.zone
Selected target westmere
Illegal instruction (core dumped)
After the code change and rebuild:
$ ./zone-bench parse ../../example.com.zone
Selected target fallback
Parsed 6 records
Please let me know if you need detailed info (build log, etc.).
Thanks for testing @regbinkk! Glad the assumption was correct. I don't need the log. I'll push the fix asap and everything should work correctly with 4.10.2. Thanks again for reporting and testing :+1:
Thanks a lot @k0ekk0ek for your time and efforts investigating and fixing an issue that might only be relevant to a negligible small number of people :).
The fix is merged and will ship in the upcoming release.
Reposting my original bug report from the FreeBSD bugtracker because I was politely asked to.
General problem description
Version 4.10.0 and subsequently 4.10.1 hangs indefinitely at startup causing high CPU load. Apparently that happens while parsing the zone files. Without any zone files configured the service starts normally. There are only a couple of zone files with < 100 entries each. Even the minimalistic zone from the official nsd documentation alone triggers the same behaviour. Same configuration works fine with versions < 4.10. No log/console output even with max verbosity and debugging on.
FreeBSD 13.3-RELEASE/amd64
Extra info
Tested with both nsd 4.9.1 and 4.10.1.
nsd-checkconf returns no errors in both versions. nsd-checkzone returns no error in version 4.9.1, however the same zone files (also the example.com zone) in version 4.10.1 result in
Starting the service (version 4.10.1) with 'verbosity: 3' produces only:
Manually killing the nsd sub-process (since it's not responding to any commands) produces:
That's how nsd looks like in the process list:
I cannot reproduce the problem with the same configuration files on any other machine. Is it possible that I've hit some zone-parsing simdzone bug in combination with old hardware? The problematic nsd instance is running on a rather old machine:
CPU: Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz (2809.95-MHz K8-class CPU)
SSE4.2 instructions seem to be available, AVX2 on the other hand - not. I'm not aware of the internal nsd/simdzone workings in such a situation. On the FreeBSD bugtracker Jaap Akkerhuis pointed out that something similar was already reported: