Open holkmann opened 8 months ago
I am seeing the same issue,
# keydb-server
Illegal instruction
Downgrading to 6:6.3.3-1+deb12u1 fixed the problem.
Seeing the same on the official docker image 😕
Seeing the same. Could it be due to older hardware? In production (Ubuntu 22.04.3 LTS) its working, however on our accept environment running the exact same software, but virtualised through proxmox I have this problem as well.
One of my client's runs Hyper-V and I'm seeing the same issue with a Rocky Linux 8 VM running the Docker version.
Interestingly that VM has Processor Compatibility Mode (PCM) enabled. If I reboot the VM with PCM disabled 6.3.4 now runs.
Unfortunately that VM requires PCM enabled so we are stuck on 6.3.3 for the time being.
Had an opportunity to test Running the Docker version of 6.3.4 on a Rocky Linux 9 VM under Proxmox 8.1.3 and it does work for me with the CPU type for the VM set to Haswell-noTSX-IBRS (running on a host with a better CPU than that, another host in the datacenter has an older CPU hence that setting).
Same problem here on a debian VM with the default x86-64-v2-AES cpu type on a Intel Xeon E5-2630Lv2 CPU
6.3.4 fails with illegal hardware instruction but going back to 6.3.3 works fine.
lscpu from the VM
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
BIOS Vendor ID: QEMU
Model name: QEMU Virtual CPU version 2.5+
BIOS Model name: pc-i440fx-8.1 CPU @ 2.0GHz
BIOS CPU family: 1
CPU family: 15
Model: 107
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 1
Stepping: 1
BogoMIPS: 4799,99
Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 cl
flush mmx fxsr sse sse2 ht syscall nx lm constant_tsc nopl xtopology c
puid tsc_known_freq pni ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hyp
ervisor lahf_lm cpuid_fault pti
Virtualization features:
Hypervisor vendor: KVM
Virtualization type: full
Caches (sum of all):
L1d: 256 KiB (8 instances)
L1i: 256 KiB (8 instances)
L2: 32 MiB (8 instances)
L3: 16 MiB (1 instance)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s):
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: KVM: Mitigation: VMX unsupported
L1tf: Mitigation; PTE Inversion
Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state
unknown
Meltdown: Mitigation; PTI
Mmio stale data: Unknown: No mitigations
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Vulnerable
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not a
ffected
Srbds: Not affected
Tsx async abort: Not affected
lscpu from the Host
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2630L v2 @ 2.40GHz
CPU family: 6
Model: 62
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2
Stepping: 4
CPU(s) scaling MHz: 46%
CPU max MHz: 2800.0000
CPU min MHz: 1200.0000
BogoMIPS: 4800.14
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts re
p_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xs
ave avx f16c rdrand lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts vnmi md_clear flush_l1d
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 384 KiB (12 instances)
L1i: 384 KiB (12 instances)
L2: 3 MiB (12 instances)
L3: 30 MiB (2 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: KVM: Mitigation: Split huge pages
L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Meltdown: Mitigation; PTI
Mmio stale data: Unknown: No mitigations
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Srbds: Not affected
Tsx async abort: Not affected
Addendum: Change the "cpu type" from "x86-64-v2-AES" (Proxmox 8 Standard) to "Host" resolves this problem. (VM Hardware Tab => Processors)
I mean, it's a workaround, not really a solution. (host isn't recommended for a cluster with nodes with different cpus as far as i know)
I mean, it's a workaround, not really a solution. (host isn't recommended for a cluster with nodes with different cpus as far as i know)
Of course you're right, but we'll have to wait until one of the developers has a solution for this.
Still present as of today
Still present as of today
True, but I think it has to do with this statement: https://github.com/Snapchat/KeyDB/issues/798#issuecomment-2013291234
We will probably have to wait a while for a bug fix.
Describe the bug
Debian 12 (Proxmox VM / Proxmox 8.1.3), freshly installed, only for KeyDB. KeyDB 6.3.3 works without errors. As soon as I update to 6.3.4, KeyDB or KeyDB server no longer works. If I switch back to version 6.3.3, KeyDB works again. I was able to reproduce with different Debian 12 VMs.
To reproduce
Notice: I previously had version 6.3.3 on hold via apt-mark and therefore only shows the upgrade process, which is sufficient for reproducing.
Why this? => "Could not execute systemctl: at /usr/bin/deb-systemd-invoke line 145."
The keydb-server.log only shows this:
Expected behavior
I would have expected that KeyDB or KeyDB server would continue to work after an upgrade. :-) Have I perhaps overlooked something or made a mistake somewhere?
Additional information
Thanks for help! If you need more informations, just let me know.