antirez / disque

Disque is a distributed message broker
BSD 3-Clause "New" or "Revised" License
8.01k stars 538 forks source link

Crash with dump on a minipc (liva X with 2g ram on ubuntu 14.04.3) #185

Open dyu opened 8 years ago

dyu commented 8 years ago

I don't get this crash on my other minipc (kangaroo) with the exact same configuration. I'm wondering if this related to the kernel (the kangaroo had kernel 4.x on it)

4514:P 16 Apr 09:36:47.572 #     Disque 1.0-rc1 crashed by signal: 11
4514:P 16 Apr 09:36:47.572 #     Failed assertion: <no assertion failed> (<no file>:0)
4514:P 16 Apr 09:36:47.572 # --- STACK TRACE
/opt/disque/bin/disque-server(logStackTrace+0x33)[0x42bd13]
/lib/x86_64-linux-gnu/libc.so.6(+0x154844)[0x7fe3de3af844]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fe3de630340]
/lib/x86_64-linux-gnu/libc.so.6(+0x154844)[0x7fe3de3af844]
/opt/disque/bin/disque-server(clusterLoadConfig+0xb2)[0x42d0b2]
/opt/disque/bin/disque-server(clusterInit+0xbb)[0x43021b]
/opt/disque/bin/disque-server(initServer+0x2ba)[0x419a0a]
/opt/disque/bin/disque-server(main+0x2e6)[0x4114c6]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fe3de27cec5]
/opt/disque/bin/disque-server[0x411732]
4514:P 16 Apr 09:36:47.573 # --- INFO OUTPUT
4514:P 16 Apr 09:36:47.573 # # Server
disque_version:1.0-rc1
disque_git_sha1:f4520baf
disque_git_dirty:0
disque_build_id:18b296112fffd94e
os:Linux 3.16.0-59-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.8.4
process_id:4514
run_id:d287b2c7e9549e8e0f3f590d8fb20e1f9356fbc9
tcp_port:7100
uptime_in_seconds:0
uptime_in_days:0
hz:10
executable:/opt/disque/bin/disque-server
config_file:/home/deploy/timesheet/disque.conf

# Clients
connected_clients:0
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:434744
used_memory_human:424.55K
used_memory_rss:0
used_memory_peak:434744
used_memory_peak_human:424.55K
mem_fragmentation_ratio:0.00
mem_allocator:jemalloc-4.0.3

# Jobs
registered_jobs:0

# Queues
registered_queues:0

# Persistence
loading:0
aof_enabled:1
aof_state:on
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_current_size:0
aof_base_size:0
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0

# Stats
total_connections_received:0
total_commands_processed:0
instantaneous_ops_per_sec:0
total_net_input_bytes:0
total_net_output_bytes:0
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
latest_fork_usec:0

# CPU
used_cpu_sys:0.00
used_cpu_user:0.00
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

# Commandstats
hash_init_value: 1461263019

4514:P 16 Apr 09:36:47.573 # --- CLIENT LIST OUTPUT
4514:P 16 Apr 09:36:47.573 # 
4514:P 16 Apr 09:36:47.573 # --- REGISTERS
4514:P 16 Apr 09:36:47.573 # 
RAX:0000000000000001 RBX:0000000030010201
RCX:0000000000000003 RDX:000000000000ffff
RDI:0000000030010200 RSI:0000000000476f30
RBP:00007fe3dde16943 RSP:00007ffd22118ce8
R8 :0000000000000000 R9 :fffffffffffdc160
R10:00007fe3de3af840 R11:000000000000008d
R12:00007fe3dde26140 R13:00007fe3dde151d8
R14:00007fe3dde39000 R15:00007fe3dde26003
RIP:00007fe3de3af844 EFL:0000000000010212
CSGSFS:6570000000000033
4514:P 16 Apr 09:36:47.573 # (00007ffd22118cf7) -> 000000002bc4b0a5
4514:P 16 Apr 09:36:47.573 # (00007ffd22118cf6) -> 0000000057110aa3
4514:P 16 Apr 09:36:47.573 # (00007ffd22118cf5) -> 0000000000000008
4514:P 16 Apr 09:36:47.573 # (00007ffd22118cf4) -> 0000000000001000
4514:P 16 Apr 09:36:47.573 # (00007ffd22118cf3) -> 0000000000000113
4514:P 16 Apr 09:36:47.573 # (00007ffd22118cf2) -> 0000000000000000
4514:P 16 Apr 09:36:47.573 # (00007ffd22118cf1) -> 0000000000000384
4514:P 16 Apr 09:36:47.573 # (00007ffd22118cf0) -> 000003e7000081a4
4514:P 16 Apr 09:36:47.573 # (00007ffd22118cef) -> 0000000000000001
4514:P 16 Apr 09:36:47.573 # (00007ffd22118cee) -> 00000000000662ac
4514:P 16 Apr 09:36:47.574 # (00007ffd22118ced) -> 000000000000b302
4514:P 16 Apr 09:36:47.574 # (00007ffd22118cec) -> 0000000001d46410
4514:P 16 Apr 09:36:47.574 # (00007ffd22118ceb) -> 0000000000000000
4514:P 16 Apr 09:36:47.574 # (00007ffd22118cea) -> 0000000001d464f0
4514:P 16 Apr 09:36:47.574 # (00007ffd22118ce9) -> 0000000000000000
4514:P 16 Apr 09:36:47.574 # (00007ffd22118ce8) -> 000000000042d0b2
4514:P 16 Apr 09:36:47.574 # --- FAST MEMORY TEST
4514:P 16 Apr 09:36:47.637 # Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.
4514:P 16 Apr 09:36:47.637 # 
dyu commented 8 years ago

I figured out what caused the crash. The nodes.conf seems to be corrupted. Here's the contents:

fe1f8af12acb582b0680185bf5047721804f849a 192.168.1.20:7100 noflags 0 1460717777762 connected
e0696155971d348a26923f2ef3cadcf453e43cbf 192.168.1.21:7100 myself 0 0 connected
6a2d9536f7fab19a7eed3ff261aa9469ae2511d9 192.168.1.19:7100 noflags 0 1460717777524 connected
\00\00\00\00\00\00\00\00\00

Notice the last line. If I manually remove that, then disque starts up fine. Only disque writes to nodes.conf. Could this be caused by disque mmaping the file and failing to close it properly?