apache / incubator-pegasus

Apache Pegasus - A horizontally scalable, strongly consistent and high-performance key-value store
https://pegasus.apache.org/
Apache License 2.0
1.99k stars 312 forks source link

Bug(message_ex::copy): In some case, the message_ex::copy function will trigger an assertion #1937

Closed Sunflower876 closed 6 months ago

Sunflower876 commented 8 months ago

Bug Report

There are two formats for message_ex, one is that both the header and body are saved in the buffers field, and the header and buffers [0] are the same. the other format is that the header is not saved in the buffers field and exists separately. In the message_ex::copy function, total_length represents the sum of the lengths of the header and body. However, in the second case, i only counts the length of the body, so checking that i and total_length are equal will fail.

20240308-151800

Sunflower876 commented 8 months ago

@acelyc111

acelyc111 commented 8 months ago

@Sunflower876 Thanks!

Have you encountered such a check failure? Could you please leave some logs here, a backtrace would be better.

Sunflower876 commented 8 months ago

@Sunflower876 Thanks!

Have you encountered such a check failure? Could you please leave some logs here, a backtrace would be better.

#0  0x00007fe05b4a3694 in vfprintf () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) #0  0x00007fe05b4a3694 in vfprintf () from /lib64/libc.so.6
#1  0x00007fe05f16ae00 in dsn::tools::simple_logger::dsn_logv (
this=0x2b25030, file=<optimized out>, function=<optimized out>,
line=<optimized out>, log_level=LOG_LEVEL_FATAL,
fmt=0x7fe06058ab7d "%d VS %d, rpc_name = %s", args=<optimized out>)
at /home/work/pegasus/rdsn/src/utils/simple_logger.cpp:206
#2  0x00007fe05f15e04c in dsn_logv (file=0x7fe0604f25b7 "compiler_depend.ts",
function=0x7fe06058ac56 <dsn::message_ex::copy(bool, bool)::__FUNCTION__> "copy", line=248, log_level=LOG_LEVEL_FATAL,
fmt=0x7fe06058ab7d "%d VS %d, rpc_name = %s",
args=args@entry=0x7fe01b3ce598)
at /home/work/pegasus/rdsn/src/utils/logging.cpp:129
#3  0x00007fe05f15e0cf in dsn_logf (
file=file@entry=0x7fe0604f25b7 "compiler_depend.ts",
function=function@entry=0x7fe06058ac56 <dsn::message_ex::copy(bool, bool)::__FUNCTION__> "copy", line=line@entry=248,
log_level=log_level@entry=LOG_LEVEL_FATAL,
fmt=fmt@entry=0x7fe06058ab7d "%d VS %d, rpc_name = %s")
at /home/work/pegasus/rdsn/src/utils/logging.cpp:141
#4  0x00007fe060452348 in dsn::message_ex::copy (this=<optimized out>,
clone_content=clone_content@entry=true,
copy_for_receive=copy_for_receive@entry=true)
at /home/work/pegasus/rdsn/src/runtime/rpc/rpc_message.cpp:247
#5  0x000000000058d6da in pegasus::server::pegasus_server_impl::on_refresh_ttl
(this=0x41a4100, rpc=...)
at /home/work/pegasus/src/server/pegasus_server_impl.cpp:972
#6  0x0000000000560a8b in pegasus::server::pegasus_read_service::on_refresh_ttl (svc=<optimized out>, rpc=...)
at /home/work/pegasus/src/server/pegasus_read_service.h:100
#7  0x0000000000568513 in operator() (request=<optimized out>, p=0x41a4100,
__closure=<optimized out>)
at /home/work/pegasus/DSN_ROOT/include/dsn/dist/replication/storage_serverlet.h:41
#8  std::_Function_handler<void (pegasus::server::pegasus_read_service*, dsn::message_ex*), bool dsn::replication::storage_serverlet<pegasus::server::pegasus_read_service>::register_rpc_handler_with_rpc_holder<dsn::rpc_holder<dsn::apps::refresh_ttl_request, dsn::apps::refresh_ttl_response> >(dsn::task_code, char const*, void (*)(pegasus::server::pegasus_read_service*, dsn::rpc_holder<dsn::apps::refresh_ttl_request, dsn::apps::refresh_ttl_response>))::{lambda(pegasus::server::pegasus_read_service*, dsn::message_ex*)#1}>::_M_invoke(std::_Any_data const&, pegasus::server::pegasus_read_service*&&, dsn::message_ex*&&) (
__functor=..., __args#0=<optimized out>, __args#1=<optimized out>)
at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:316
#9  0x000000000059f71b in operator() (__args#1=<optimized out>,
__args#0=<optimized out>, this=<optimized out>)
at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:706
#10 handle_request (request=0x218218c0, this=0x41a4100)
at /home/work/pegasus/DSN_ROOT/include/dsn/dist/replication/storage_serverlet.h:92
#11 pegasus::server::pegasus_read_service::on_request (this=0x41a4100,
request=0x218218c0)
at /home/work/pegasus/src/server/pegasus_read_service.h:48
#12 0x00007fe06024a2b7 in dsn::replication::replica::on_client_read (
this=0x3472800, request=request@entry=0x218218c0,
ignore_throttling=ignore_throttling@entry=false)
at /home/work/pegasus/rdsn/src/replica/replica.cpp:252
#13 0x00007fe0602adb01 in dsn::replication::replica_stub::on_client_read (
this=0x2b48300, id=..., request=0x218218c0)
at /home/work/pegasus/rdsn/src/replica/replica_stub.cpp:956
#14 0x00007fe060453c22 in operator() (__args#0=<optimized out>,
this=0x2d4bd0d0)
at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:706
#15 dsn::rpc_request_task::exec (this=0x2d4bd000)
at /home/work/pegasus/rdsn/include/dsn/tool-api/task.h:435
#16 0x00007fe0604549d1 in dsn::task::exec_internal (
this=this@entry=0x2d4bd000)
at /home/work/pegasus/rdsn/src/runtime/task/task.cpp:176
#17 0x00007fe060469d9a in dsn::task_worker::loop (this=0x337f550)
at /home/work/pegasus/rdsn/src/runtime/task/task_worker.cpp:211
#18 0x00007fe060469fc0 in dsn::task_worker::run_internal (this=0x337f550)
at /home/work/pegasus/rdsn/src/runtime/task/task_worker.cpp:191
#19 0x00007fe0604f225f in execute_native_thread_routine ()
from /home/work/app/pegasus/c3srv-algosample-other/replica/package/bin/[libdsn_replica_server.so](http://libdsn_replica_server.so/)
#20 0x00007fe05d03fdc5 in start_thread () from /lib64/libpthread.so.0
#21 0x00007fe05b55273d in clone () from /lib64/libc.so.6
Sunflower876 commented 8 months ago

@Sunflower876 Thanks! Have you encountered such a check failure? Could you please leave some logs here, a backtrace would be better.

0 0x00007fe05b4a3694 in vfprintf () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-28.el7_5.1.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) #0 0x00007fe05b4a3694 in vfprintf () from /lib64/libc.so.6 #1 0x00007fe05f16ae00 in dsn::tools::simple_logger::dsn_logv ( this=0x2b25030, file=, function=, line=, log_level=LOG_LEVEL_FATAL, fmt=0x7fe06058ab7d "%d VS %d, rpc_name = %s", args=) at /home/work/pegasus/rdsn/src/utils/simple_logger.cpp:206 #2 0x00007fe05f15e04c in dsn_logv (file=0x7fe0604f25b7 "compiler_depend.ts", function=0x7fe06058ac56 <dsn::message_ex::copy(bool, bool)::FUNCTION> "copy", line=248, log_level=LOG_LEVEL_FATAL, fmt=0x7fe06058ab7d "%d VS %d, rpc_name = %s", args=args@entry=0x7fe01b3ce598) at /home/work/pegasus/rdsn/src/utils/logging.cpp:129 #3 0x00007fe05f15e0cf in dsn_logf ( file=file@entry=0x7fe0604f25b7 "compiler_depend.ts", function=function@entry=0x7fe06058ac56 <dsn::message_ex::copy(bool, bool)::FUNCTION> "copy", line=line@entry=248, log_level=log_level@entry=LOG_LEVEL_FATAL, fmt=fmt@entry=0x7fe06058ab7d "%d VS %d, rpc_name = %s") at /home/work/pegasus/rdsn/src/utils/logging.cpp:141 #4 0x00007fe060452348 in dsn::message_ex::copy (this=, clone_content=clone_content@entry=true, copy_for_receive=copy_for_receive@entry=true) at /home/work/pegasus/rdsn/src/runtime/rpc/rpc_message.cpp:247 #5 0x000000000058d6da in pegasus::server::pegasus_server_impl::on_refresh_ttl (this=0x41a4100, rpc=...) at /home/work/pegasus/src/server/pegasus_server_impl.cpp:972 #6 0x0000000000560a8b in pegasus::server::pegasus_read_service::on_refresh_ttl (svc=, rpc=...) at /home/work/pegasus/src/server/pegasus_read_service.h:100 #7 0x0000000000568513 in operator() (request=, p=0x41a4100, closure=) at /home/work/pegasus/DSN_ROOT/include/dsn/dist/replication/storage_serverlet.h:41 #8 std::_Function_handler<void (pegasus::server::pegasus_read_service, dsn::message_ex), bool dsn::replication::storage_serverletpegasus::server::pegasus_read_service::register_rpc_handler_with_rpc_holder<dsn::rpc_holder<dsn::apps::refresh_ttl_request, dsn::apps::refresh_ttl_response> >(dsn::taskcode, char const*, void ()(pegasus::server::pegasus_readservice, dsn::rpc_holder<dsn::apps::refresh_ttl_request, dsn::apps::refresh_ttl_response>))::{lambda(pegasus::server::pegasus_read_service, dsn::message_ex)#1}>::_M_invoke(std::_Any_data const&, pegasus::server::pegasus_read_service&&, dsn::message_ex&&) ( functor=..., args#0=, args#1=) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:316 #9 0x000000000059f71b in operator() (args#1=, args#0=, this=) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:706 #10 handle_request (request=0x218218c0, this=0x41a4100) at /home/work/pegasus/DSN_ROOT/include/dsn/dist/replication/storage_serverlet.h:92 #11 pegasus::server::pegasus_read_service::on_request (this=0x41a4100, request=0x218218c0) at /home/work/pegasus/src/server/pegasus_read_service.h:48 #12 0x00007fe06024a2b7 in dsn::replication::replica::on_client_read ( this=0x3472800, request=request@entry=0x218218c0, ignore_throttling=ignore_throttling@entry=false) at /home/work/pegasus/rdsn/src/replica/replica.cpp:252 #13 0x00007fe0602adb01 in dsn::replication::replica_stub::on_client_read ( this=0x2b48300, id=..., request=0x218218c0) at /home/work/pegasus/rdsn/src/replica/replica_stub.cpp:956 #14 0x00007fe060453c22 in operator() (__args#0=, this=0x2d4bd0d0) at /opt/rh/devtoolset-7/root/usr/include/c++/7/bits/std_function.h:706 #15 dsn::rpc_request_task::exec (this=0x2d4bd000) at /home/work/pegasus/rdsn/include/dsn/tool-api/task.h:435 #16 0x00007fe0604549d1 in dsn::task::exec_internal ( this=this@entry=0x2d4bd000) at /home/work/pegasus/rdsn/src/runtime/task/task.cpp:176 #17 0x00007fe060469d9a in dsn::task_worker::loop (this=0x337f550) at /home/work/pegasus/rdsn/src/runtime/task/task_worker.cpp:211 #18 0x00007fe060469fc0 in dsn::task_worker::run_internal (this=0x337f550) at /home/work/pegasus/rdsn/src/runtime/task/task_worker.cpp:191 #19 0x00007fe0604f225f in execute_native_thread_routine () from /home/work/app/pegasus/c3srv-algosample-other/replica/package/bin/libdsn_replica_server.so #20 0x00007fe05d03fdc5 in start_thread () from /lib64/libpthread.so.0 #21 0x00007fe05b55273d in clone () from /lib64/libc.so.6

image

Sunflower876 commented 7 months ago

@Sunflower876 Thanks!

Have you encountered such a check failure? Could you please leave some logs here, a backtrace would be better.

The first format is the Thrift message format, used for communication between client and server. The second format is the RDSN message format, used for communication between servers. The second format calling message_ex:: copy will definitely crash.

7d85bfd6-62d8-4e40-b57d-c43a757c03b1

image

acelyc111 commented 7 months ago

I saw the crash is caused by request on_refresh_ttl, is it a internal API in your environment?

Sunflower876 commented 7 months ago

I saw the crash is caused by request on_refresh_ttl, is it a internal API in your environment?

Yes. But it is because there is a bug in the implementation of message_ex:: copy that any message call between servers will lead to crash. You can try the case below. This will definitely crash.

image