Closed Simon-Laux closed 2 years ago
Looks like something to be upstreamed to https://github.com/smol-rs/concurrent-queue
@Simon-Laux is it an amd64 system?
@Simon-Laux Could you try with deltachat/deltachat-core-rust#2444? Pretty sure it is a bug in concurrent-queue bit twiddling at https://github.com/smol-rs/concurrent-queue/blob/master/src/bounded.rs, think you can upstream the bug to their repo.
Upstream issue: https://github.com/smol-rs/concurrent-queue/issues/11
Here a few backtraces from segfaults: I got the first one multiple times out of the 6 crash dumps I looked at:
Program terminated with signal SIGSEGV, Segmentation fault.
t#0 0x00007fd99375fb90 in event_listener::Event::listen () from ./deltachat-node/build/Release/deltachat.node
[Current thread is 1 (Thread 0x7fd97f65a640 (LWP 33629))]
(gdb) bt
#0 0x00007fd99375fb90 in event_listener::Event::listen () at ./deltachat-node/build/Release/deltachat.node
deltachat/deltachat-core-rust#1 0x00007fd9933ad6c4 in <async_std::task::builder::SupportTaskLocals<F> as core::future::future::Future>::poll ()
at ./deltachat-node/build/Release/deltachat.node
deltachat/deltachat-core-rust#2 0x00007fd9935fbdf3 in dc_get_next_event () at ./deltachat-node/build/Release/deltachat.node
deltachat/deltachat-core-rust#3 0x00007fd9930a5d96 in event_handler_thread_func () at ./deltachat-node/build/Release/deltachat.node
deltachat/deltachat-core-rust#4 0x00007fd99f2f9299 in start_thread () at /usr/lib/libpthread.so.0
deltachat/deltachat-core-rust#5 0x00007fd99dc6b053 in clone () at /usr/lib/libc.so.6
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f14ccdbf14d in dc_get_event_emitter () from ./deltachat-node/build/Release/deltachat.node
[Current thread is 1 (Thread 0x7f14a33bf640 (LWP 38774))]
(gdb) bt
#0 0x00007f14ccdbf14d in dc_get_event_emitter () at ./deltachat-node/build/Release/deltachat.node
deltachat/deltachat-core-rust#1 0x00007f14cc869d62 in event_handler_thread_func () at ./deltachat-node/build/Release/deltachat.node
deltachat/deltachat-core-rust#2 0x00007f14d82bc299 in start_thread () at /usr/lib/libpthread.so.0
deltachat/deltachat-core-rust#3 0x00007f14d6c2e053 in clone () at /usr/lib/libc.so.6
Some of these could also be bugs inside of the node-bindings :shrug: Also I was surprised to see this crash:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fd90e4fb44d in dc_get_chat_id_by_contact_id ()
from ./deltachat-desktop/node_modules/deltachat-node/build/Release/deltachat.node
[Current thread is 1 (Thread 0x7fd8cf4e6640 (LWP 12048))]
(gdb) bt
#0 0x00007fd90e4fb44d in dc_get_chat_id_by_contact_id ()
at ./deltachat-desktop/node_modules/deltachat-node/build/Release/deltachat.node
deltachat/deltachat-core-rust#1 0x0000000000000000 in ()
(gdb) bt
#0 0x00007fd90e4fb44d in dc_get_chat_id_by_contact_id ()
at ./deltachat-desktop/node_modules/deltachat-node/build/Release/deltachat.node
deltachat/deltachat-core-rust#1 0x0000000000000000 in ()
no simple function like that should be able to crash the program.
These functions are ok, they are written in safe rust and don't crash on android. Most likely node bindings call the functions with already freed dc_context_t
.
strange the macro NAPI_DCN_CONTEXT
checks for empty context:
https://github.com/deltachat/deltachat-node/blob/6c941d2e73632abaf2eddc6b2c29b253357b3327/src/napi-macros-extensions.h#L17
for the event emitter thing I did a pr fixing the garbage collection / closed context check: https://github.com/deltachat/deltachat-node/pull/502
strange the macro
NAPI_DCN_CONTEXT
checks for empty context
Maybe someone called dc_context_unref
and then keeps using the pointer? NULL
check is not enough to catch this.
Edit: I see, the pointer is set to NULL
after dc_context_unref
. Not sure how thread-safe this is, what if some thread frees the pointer while it is already fetched from memory?
Related core issue: https://github.com/deltachat/deltachat-core-rust/issues/2280
got the concurrent-queue crash again:
name = 'deltachat_ffi'
operating_system = 'unix:Manjaro'
crate_version = '1.55.0'
explanation = '''
Panic occurred in file '~/.cargo/registry/src/github.com-1ecc6299db9ec823/concurrent-queue-1.2.2/src/bounded.rs' at line 160
'''
cause = 'index out of bounds: the len is 1024 but the index is 1806'
method = 'Panic'
backtrace = '''
0: 0x7f7c1e27e353 - async_channel::Receiver<T>::try_recv::hf80e615fbe9ca3bb
1: 0x7f7c1e1eb654 - <async_std::task::builder::SupportTaskLocals<F> as core::future::future::Future>::poll::h04d16837fdd0a69e
2: 0x7f7c1e439e03 - dc_get_next_event
3: 0x7f7c1dee3d96 - event_handler_thread_func
4: 0x7f7c2a137299 - start_thread
5: 0x7f7c28aa9053 - clone
6: 0x0 - <unresolved>'''
You can revert https://github.com/deltachat/deltachat-core-rust/pull/2444, it should have used 1023 instead of 1024 anyway (currently the cap is 2048 as a result of this off-by-one error) and the real problem looks like use-after-free, not a bug in concurrent-queue.
let's reopen if it happens again
Desktop crashes too often for me (~ more than 60% on first startup after system restart and 10-20% of the other startups), most of the time I don't get an crash report, this time I got one.