NLnetLabs / nsd

The NLnet Labs Name Server Daemon (NSD) is an authoritative, RFC compliant DNS nameserver.
https://nlnetlabs.nl/nsd
BSD 3-Clause "New" or "Revised" License
448 stars 102 forks source link

FreeBSD on version 4.10.0 #361

Closed wekers closed 1 month ago

wekers commented 2 months ago

Hi all, after update NSD to version: 4.10.0, its crashing after start, no logs. and nsd-checkzone say "Illegal instruction (core dumped)" PS: without add zone: in nsd.conf start ok

FreeBSD 13.3-RELEASE-p4

$ truss nsd-control status connect(3,{ AF_INET 127.0.0.1:8952 },16) ERR#36 'Operation now in progress' select(4,{ 3 },{ 3 },{ 3 },{ 5.000000 }) = 1 (0x1) getsockopt(3,SOL_SOCKET,SO_ERROR,0x8209bbedc,0x8209bbed8) = 0 (0x0) fcntl(3,F_SETFL,O_RDONLY) = 0 (0x0) clock_gettime(13,{ 1722101925.000000000 }) = 0 (0x0) getpid() = 69086 (0x10dde) getpid() = 69086 (0x10dde) clock_gettime(13,{ 1722101925.000000000 }) = 0 (0x0) getpid() = 69086 (0x10dde) getpid() = 69086 (0x10dde) clock_gettime(13,{ 1722101925.000000000 }) = 0 (0x0) getpid() = 69086 (0x10dde) getpid() = 69086 (0x10dde) clock_gettime(13,{ 1722101925.000000000 }) = 0 (0x0) write(3,"\^V\^C\^A\^A \^A\0\^A\^\^C\^C"...,293) = 293 (0x125)

Back to version 4.3.3, work fine

k0ekk0ek commented 2 months ago

HI @wekers. Thanks for reporting. With 4.10.0, we started using simdzone for zone parsing which uses SIMD instructions found in modern CPUs to speed up the process. Selection of which instruction set to use should be handled automatically using the CPUID instruction. You can force use of the fallback parser by setting the environment variable ZONE_KERNEL=fallback to workaround the issue. You might also want to try 4.10.1rc2 rather than 4.10.0. What's really interesting about this scenario is why is a kernel being selected that is not supported by the hardware?

Can you tell me a bit more about your setup? It would really help in trying to locate the problem. I'm assuming you're on x86_64? Which type of CPU are you using? Is the machine you're running the same machine you compiled the software on? Perhaps some kind of hypervisor is being used?

wekers commented 2 months ago

Hi @k0ekk0ek,

It's a VPS compiled on ports, make install. In a Jail root@ns3/usr/ports/dns/nsd # cat distinfo TIMESTAMP = 1718281790 SHA256 (nsd-4.10.0.tar.gz) = 6317d7f5e3f01c33912f313d66a33dd1ace1cdf7f19d5c590b2e430d8ca4605f SIZE (nsd-4.10.0.tar.gz) = 1388963


QEMU Virtual CPU version (cpu64-rhel6) hw.clockrate: 2600 hw.ncpu: 4 kern.smp.cpus: 4 CPU: QEMU Virtual CPU version (cpu64-rhel6) (2600.19-MHz K8-class CPU) FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs hw.machine_arch: amd64 cpu0: on acpi0 kern.ccpu: 0

0, 1, 2, 3

kern.sched.cpusetsizemin: 1 kern.sched.cpusetsize: 32 kern.pin_pcpu_swi: 0 $ dmesg | grep -i cpu CPU: QEMU Virtual CPU version (cpu64-rhel6) (2600.19-MHz K8-class CPU) FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0: on acpi0


Architecture: amd64 Byte Order: Little Endian Total CPU(s): 4 Vendor: GenuineIntel CPU family: 6 Model: 13 Model name: QEMU Virtual CPU version (cpu64-rhel6) Stepping: 3 L1d cache: 32K L1i cache: 32K L2 cache: 2M Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 cflsh mmx fxsr sse sse2 sse3 cx16 hypervisor syscall nx lm lahf_lm

k0ekk0ek commented 2 months ago

It shouldn't select any SIMD kernel, the right flags are not present. Of course, some code might be optimized for a newer architecture by accident, this one seems to have a maximum of SSE3, which is quite old (westmere, SSE4.2, was released in 2008). Or the automatic kernel selection picks the wrong one for some reason. Does the same error occur when you set the environment variable ZONE_KERNEL=fallback?

wekers commented 2 months ago

@k0ekk0ek, with setenv ZONE_KERNEL fallback same problem too

I have install 4.3.5 to see, and ok, DNSSEC ok too.

root@ns3/root # nsd-control status version: 4.3.5 verbosity: 0 ratelimit: 5

k0ekk0ek commented 2 months ago

@wekers, I've setup a KVM instance with FreeBSD 13.3 and reduced functionality on my system, judging by the output above that is what you're using?

From dmesg:

CPU: QEMU Virtual CPU version 2.5+ (1497.64-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x60fb1  Family=0xf  Model=0x6b  Stepping=1
  Features=0x783fbfd<FPU,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2>
  Features2=0x80202001<SSE3,CX16,x2APIC,HV>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>

For me it all worked just fine.

Any chance of getting remote access to a similar setup for reproducing? I'm quite positive the compiler outputs code that the machine just can't run, but if the programs are built on the machine they're running on, I'm not sure how that situation comes to be.