Snapchat / KeyDB

A Multithreaded Fork of Redis
https://keydb.dev
BSD 3-Clause "New" or "Revised" License
11.47k stars 578 forks source link

[CRASH] Starting KeyDB on ARM hardware causing serverAssert failure #856

Open ronnyek opened 3 months ago

ronnyek commented 3 months ago

I'm attempting to build a container image (has to be proprietary unfortunately) that is to be run on ARM hardware. Initially I was getting an error around invalid page size in jemalloc, but adding --with-lg-page=16 did get us past that problem.

Now on start I get server.cpp:6531 '!ret' is not true

Crash report

=== KEYDB BUG REPORT START: Cut & paste starting from here ===
1:1:C 31 Jul 2024 16:05:19.226 # === ASSERTION FAILED ===
1:1:C 31 Jul 2024 16:05:19.226 # ==> server.cpp:6531 '!ret' is not true

------ STACK TRACE ------

Backtrace:
keydb-server(linuxMadvFreeForkBugCheck()+0x368) [0x45b828]
keydb-server(main+0x31c) [0x4432cc]
/lib64/libc.so.6(+0x27300) [0xffffaa607300]
/lib64/libc.so.6(__libc_start_main+0x98) [0xffffaa6073d8]
keydb-server(_start+0x30) [0x447670]

------ INFO OUTPUT ------
Keydb starting as active-replica and multi-master
1:1:C 31 Jul 2024 16:08:03.327 * Notice: "active-replica yes" implies "replica-read-only no"
1:1:C 31 Jul 2024 16:08:03.327 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:1:C 31 Jul 2024 16:08:03.327 # oO0OoO0OoO0Oo KeyDB is starting oO0OoO0OoO0Oo
1:1:C 31 Jul 2024 16:08:03.327 # KeyDB version=6.3.4, bits=64, commit=7e7e5e57, modified=1, pid=1, just started
1:1:C 31 Jul 2024 16:08:03.327 # Configuration loaded

Additional information

  1. Not sure if this matters, but this is being deployed on rockylinux 8 based container image
  2. A perm link for the code in server.cpp aroudn that line number
jcy1001 commented 3 months ago

yes ,on arm ,i also have this problem

ronnyek commented 3 months ago

In digging further it seems like this may be related to linux kernel specific to arm having a bug related to pgtable, and that keydb/redis code apparently attempts to check whether that bug exists in the linux kernel. arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect()

Seems like running a linux kernel of a newer version (that had the original issue fixed) would likely start right up and work.

I guess my question here is whether I'd likely run into that issue, if I was not doing any writes to storage (essential memory only caching).