Open mreynolds389 opened 2 years ago
The most likely cause of this is a bug in the calling automember code causing a double free in the cstring code. This is because it is invalid for C to free memory owned by the rust side (and vice versa).
We can see that this is likely because the fault occurs in "evict" where we are dropping (freeing) the cstring, which is occuring because the item is evicted. The drop is here https://github.com/kanidm/concread/blob/a51422b1cdfa09df2646d34c9a6ffcc56d72ca34/src/arcache/mod.rs#L488 where ci (Cache Item) is swapped for the new state, meaning that ci will drop here. Since the CString was already freed, that causes the panic.
Note it's not that we are hitting the impossible state (unreachable) because then the backtrace would be different.
Most likely what we are hitting is:
https://doc.rust-lang.org/src/alloc/ffi/c_str.rs.html#699
Which then calls
https://doc.rust-lang.org/src/core/slice/index.rs.html#232
finally triggering the unsafe precondition check
I think this means you probably need to track where automember is "freeing" things returned from the ndn cache instead, and they should be removed or checked.
PS: This is a really over version of concread, I seem to remember we couldn't upgrade because RH/fedora rust were too old, so it could be worth you trying to upgrade this as there have been significant performance and api improvements.
The only issue with your theory is that we always a return a copy from the ndn cache, and never return a reference. So it's probably not automember doing something wrong. I will run an ASAN test tomorrow though...
Yeah, I think that asan would be the best option because that will report double frees properly.
ASAN shows no issues
Also seeing this in replication:
#7 0x00007f598cc6968b in std::panicking::begin_panic_handler::{closure#0} () at library/std/src/panicking.rs:656
#8 0x00007f598cc66ed9 in std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::{closure_env#0}, !> (
f=<error reading variable: Cannot access memory at address 0x2c542b>) at library/std/src/sys_common/backtrace.rs:171
#9 0x00007f598cc69417 in std::panicking::begin_panic_handler (info=<optimized out>) at library/std/src/panicking.rs:652
#10 0x00007f598c9bc533 in core::panicking::panic_fmt (fmt=...) at library/core/src/panicking.rs:72
#11 0x00007f598c9bc5dc in core::panicking::panic (expr=...) at library/core/src/panicking.rs:146
#12 0x00007f598cac00f3 in concread::arcache::ARCache<alloc::ffi::c_str::CString, alloc::ffi::c_str::CString>::evict<alloc::ffi::c_str::CString, alloc::ffi::c_str::CString> (
self=0x7f598aab9380, cache=0x7f58f65fe670, inner=0x7f598aab9428, shared=0x7f598aab9388, stats=0x7f58f65fe288, commit_txid=2095678)
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/concread-0.2.21/src/arcache/mod.rs:474
#13 0x00007f598cac1399 in concread::arcache::ARCache<alloc::ffi::c_str::CString, alloc::ffi::c_str::CString>::commit<alloc::ffi::c_str::CString, alloc::ffi::c_str::CString> (
self=0x7f598aab9380, cache=..., tlocal=..., hit=..., clear=false, init_above_watermark=true)
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/concread-0.2.21/src/arcache/mod.rs:1457
#14 0x00007f598cac2ce5 in concread::arcache::ARCacheWriteTxn<alloc::ffi::c_str::CString, alloc::ffi::c_str::CString>::commit<alloc::ffi::c_str::CString, alloc::ffi::c_str::CString> (
self=...) at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/concread-0.2.21/src/arcache/mod.rs:1506
#15 0x00007f598cabc3e6 in concread::arcache::ARCache<alloc::ffi::c_str::CString, alloc::ffi::c_str::CString>::try_quiesce<alloc::ffi::c_str::CString, alloc::ffi::c_str::CString> (
self=0x7f598aab9380) at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/concread-0.2.21/src/arcache/mod.rs:796
#16 0x00007f598caa5a28 in concread::arcache::{impl#13}::drop<alloc::ffi::c_str::CString, alloc::ffi::c_str::CString> (self=0x7f58f5a15480)
at /root/.cargo/registry/src/github.com-1ecc6299db9ec823/concread-0.2.21/src/arcache/mod.rs:1929
#17 0x00007f598caa1dd7 in core::ptr::drop_in_place<concread::arcache::ARCacheReadTxn<alloc::ffi::c_str::CString, alloc::ffi::c_str::CString>> ()
at /builddir/build/BUILD/rustc-1.80.1-src/library/core/src/ptr/mod.rs:542
#18 0x00007f598caa2f7b in core::ptr::drop_in_place<rslapd::cache::ARCacheCharRead> () at /builddir/build/BUILD/rustc-1.80.1-src/library/core/src/ptr/mod.rs:542
#19 0x00007f598caa342a in core::ptr::drop_in_place<alloc::boxed::Box<rslapd::cache::ARCacheCharRead, alloc::alloc::Global>> ()
at /builddir/build/BUILD/rustc-1.80.1-src/library/core/src/ptr/mod.rs:542
#20 0x00007f598caa0695 in rslapd::cache::cache_char_read_complete (read_txn=0x7f58f5a15480) at librslapd/src/cache.rs:92
#21 0x00007f598c9daffc in ndn_cache_add (dn=0x7f58c2468aa0 "uid=test_entry09842,dc=example,dc=com", dn_len=37, ndn=0x7f58d08a69f0 "uid=test_entry09842,dc=example,dc=com", ndn_len=37)
at ../389-ds-base/ldap/servers/slapd/dn.c:2982
#22 0x00007f598c9d822f in slapi_dn_normalize_ext (src=0x7f58d08a69f0 "uid=test_entry09842,dc=example,dc=com", src_len=37, dest=0x7f58f65fecd8, dest_len=0x7f58f65fecd0)
at ../389-ds-base/ldap/servers/slapd/dn.c:1143
#23 0x00007f598c9da1ef in slapi_sdn_get_dn (sdn=0x7f58c507d000) at ../389-ds-base/ldap/servers/slapd/dn.c:2335
#24 0x00007f598ad0ba3d in replay_update (prp=0x7f598762b100, op=0x7f58f65fee50, message_id=0x7f58f65fedc8) at ../389-ds-base/ldap/servers/plugins/replication/repl5_inc_protocol.c:1377
#25 0x00007f598ad0c90f in send_updates (prp=0x7f598762b100, remote_update_vector=0x7f58b0344900, num_changes_sent=0x7f58f65fef00)
at ../389-ds-base/ldap/servers/plugins/replication/repl5_inc_protocol.c:1708
#26 0x00007f598ad0aeab in repl5_inc_run (prp=0x7f598762b100) at ../389-ds-base/ldap/servers/plugins/replication/repl5_inc_protocol.c:1045
#27 0x00007f598ad13618 in prot_thread_main (arg=0x7f59876035e0) at ../389-ds-base/ldap/servers/plugins/replication/repl5_protocol.c:252
#28 0x00007f598c243749 in _pt_root (arg=0x7f59875fd140) at ../../../../nspr/pr/src/pthreads/ptthread.c:201
#29 0x00007f598c6ac897 in start_thread (arg=<optimized out>) at pthread_create.c:444
#30 0x00007f598c733a5c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
Issue Description
Server crashes during automember rebuild task using debug build: