Please answer these questions before submitting your issue. Thanks!
What did you do?
-Deploy duplication matser and back-up cluster.
-Begin duplicate.
-Run about 2~3 days.
-Some nodes coredump
What did you expect to see?
Node run as normal.
What did you see instead?
memory monitoring table.
coredump detail:
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/home/work/app/pegasus/c3srv-browser/replica/package/bin/pegasus_server config.'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f01575401d7 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f01575401d7 in raise () from /lib64/libc.so.6
#1 0x00007f01575418c8 in abort () from /lib64/libc.so.6
#2 0x00007f015c628f9e in dsn_coredump () at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/runtime/service_api_c.cpp:93
#3 0x00007f015c422c83 in dsn::replication::log_file::log_file (this=0x73aa4a630, path=0x740561c98 "/home/work/ssd2/pegasus/c3srv-browser/replica/reps/72.173.pegasus/plog/log.92534.3105061495163",
handle=<optimized out>, index=<optimized out>, start_offset=3105061495163, is_read=<optimized out>) at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/replica/log_file.cpp:166
#4 0x00007f015c4247ce in dsn::replication::log_file::open_read (path=0x740561c98 "/home/work/ssd2/pegasus/c3srv-browser/replica/reps/72.173.pegasus/plog/log.92534.3105061495163", err=...)
at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/replica/log_file.cpp:92
#5 0x00007f015c43ccfa in dsn::replication::log_utils::open_read (path=..., file=...) at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/replica/mutation_log_utils.cpp:43
#6 0x00007f015c4ff7fa in dsn::replication::load_from_private_log::find_log_file_to_start (this=this@entry=0x384c74640)
at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/replica/duplication/load_from_private_log.cpp:123
#7 0x00007f015c500360 in dsn::replication::load_from_private_log::run (this=0x384c74640) at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/replica/duplication/load_from_private_log.cpp:100
#8 0x00007f015c665f91 in dsn::task::exec_internal (this=this@entry=0x2b9bce1e0) at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/runtime/task/task.cpp:176
#9 0x00007f015c67b642 in dsn::task_worker::loop (this=0x2a67c30) at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/runtime/task/task_worker.cpp:224
#10 0x00007f015c67b7c0 in dsn::task_worker::run_internal (this=0x2a67c30) at /home/work/temp/format_pegasus/pegasus/src/rdsn/src/runtime/task/task_worker.cpp:204
#11 0x00007f015b2f8a3f in execute_native_thread_routine () from /home/work/app/pegasus/c3srv-browser/replica/package/bin/libdsn_utils.so
#12 0x00007f0159103dc5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f015760273d in clone () from /lib64/libc.so.6
(gdb)
stdout file (error log):
E2024-05-15 05:48:17.721 (1715723297721553663 62544) replica.rep_long9.040400031452989e: native_linux_aio_provider.cpp:49:open(): create file failed, err = No such file or directory
E2024-05-15 05:48:17.721 (1715723297721596680 62544) replica.rep_long9.040400031452989e: load_from_private_log.cpp:125:find_log_file_to_start(): [72.171@10.142.162.23:34801] ERR_FILE_OPERATION_FAILED: failed to open the log file (/home/work/ssd7/pegasus/c3srv-xxxxxx/replica/reps/72.171.pegasus/plog/log.91190.3060048707709)
F2024-05-15 06:03:20.656 (1715724200656901498 62545) replica.rep_long10.04040005181bcdaf: log_file.cpp:166:log_file(): assertion expression: false
F2024-05-15 06:03:20.656 (1715724200656954168 62545) replica.rep_long10.04040005181bcdaf: log_file.cpp:166:log_file(): fail to get file size of /home/work/ssd2/pegasus/c3srv-xxxxx/replica/reps/72.173.pegasus/plog/log.92534.3105061495163
What version of Pegasus are you using?
peagsus v2.4
Bug Report
Please answer these questions before submitting your issue. Thanks!
What did you do? -Deploy duplication matser and back-up cluster. -Begin duplicate. -Run about 2~3 days. -Some nodes coredump
What did you expect to see? Node run as normal.
What did you see instead? memory monitoring table.![image](https://github.com/apache/incubator-pegasus/assets/110282526/16c9406a-a96a-4d01-b980-a01a3ba0a166)
coredump detail:
stdout file (error log):