apache / brpc

brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" means "better RPC".
https://brpc.apache.org
Apache License 2.0
16.04k stars 3.92k forks source link

sofa调用brpc请求过不来 #2282

Open namelij opened 1 year ago

namelij commented 1 year ago

Describe the bug (描述bug) 目前我们在做rpc框架升级,现有线上用的是百度的sofa-pbrpc,准备升级到brpc 为了保证稳定升级,目前挨个服务升级 比如流量从A->B->C,其中A B C都是微服务,且现有都是基于sofa框架实现 在升级过程中,A B不变,仍然是sofa框架,C使用brpc框架

自测过程中,发现升级成brpc后的C服务接收不到rpc请求(如果使用http直接请求C则可以,通过A B请求C则不可行)

通过status接口,信息如下: HTTP/1.1 200 OK Content-Length: 520 Content-Type: text/plain

version:xxx non_service_error: 1184 connection_count: 22 max_concurrency: unlimited [xxx]

Load (LoadRequest) returns (LoadResponse) count: 0 qps: 0 error: 0 eps: 0 latency: 0 latency_50: 0 latency_90: 0 latency_99: 0 latency_999: 0 latency_9999: 0 max_latency: 0 concurrency: 0

Update (UpdateRequest) returns (UpdateResponse) count: 0 qps: 0 error: 0 eps: 0 latency: 0 latency_50: 0 latency_90: 0 latency_99: 0 latency_999: 0 latency_9999: 0 max_latency: 0 concurrency: 0

![Uploading image.png…]()

To Reproduce (复现方法)

Expected behavior (期望行为) 请求能够正常过来

Versions (各种版本) OS:centos7.6 Compiler:gcc11 brpc:1.5 protobuf:2.6

Additional context/screenshots (更多上下文/截图)

下面是一些基本的堆栈信息: Thread 5 (Thread 0x7fab65c0f700 (LWP 5993)):

0 0x00007fab73f89c89 in syscall () from /lib64/libc.so.6

1 0x000000000069146a in futex_wait_private (timeout=0x0, expected=, addr1=) at ./src/bthread/sys_futex.h:40

2 wait (expected_state=..., this=) at ./src/bthread/parking_lot.h:60

3 bthread::TaskGroup::wait_task (this=this@entry=0x3227b00, tid=tid@entry=0x7fab65c0ea38) at src/bthread/task_group.cpp:124

4 0x0000000000693e2b in bthread::TaskGroup::run_main_task() () at src/bthread/task_group.cpp:152

5 0x0000000000685ddc in bthread::TaskControl::worker_thread(void*) () at src/bthread/task_control.cpp:81

6 0x00007fab73c7cea5 in start_thread () from /lib64/libpthread.so.0

7 0x00007fab73f8f96d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7fab6540e700 (LWP 5994)):

0 0x00007fab73f8ff43 in epoll_wait () from /lib64/libc.so.6

1 0x000000000055f88a in brpc::EventDispatcher::Run() () at ./src/brpc/event_dispatcher_epoll.cpp:205

2 0x000000000055f9b9 in brpc::EventDispatcher::RunThis (arg=) at ./src/brpc/event_dispatcher_epoll.cpp:191

3 0x0000000000693b67 in bthread::TaskGroup::task_runner(long) () at src/bthread/task_group.cpp:298

4 0x00000000006971b1 in bthread_make_fcontext () at /opt/rh/devtoolset-11/root/usr/include/c++/11/ostream:611

5 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7fab64c0d700 (LWP 5995)):

0 0x00007fab73f89c89 in syscall () from /lib64/libc.so.6

1 0x000000000069146a in futex_wait_private (timeout=0x0, expected=, addr1=) at ./src/bthread/sys_futex.h:40

2 wait (expected_state=..., this=) at ./src/bthread/parking_lot.h:60

3 bthread::TaskGroup::wait_task (this=this@entry=0x3227e00, tid=tid@entry=0x7fab64c0ca38) at src/bthread/task_group.cpp:124

4 0x0000000000693e2b in bthread::TaskGroup::run_main_task() () at src/bthread/task_group.cpp:152

5 0x0000000000685ddc in bthread::TaskControl::worker_thread(void*) () at src/bthread/task_control.cpp:81

6 0x00007fab73c7cea5 in start_thread () from /lib64/libpthread.so.0

7 0x00007fab73f8f96d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7fab6440c700 (LWP 5996)):

0 0x00007fab73f89c89 in syscall () from /lib64/libc.so.6

1 0x000000000069146a in futex_wait_private (timeout=0x0, expected=, addr1=) at ./src/bthread/sys_futex.h:40

2 wait (expected_state=..., this=) at ./src/bthread/parking_lot.h:60

3 bthread::TaskGroup::wait_task (this=this@entry=0x3b06000, tid=tid@entry=0x7fab6440ba38) at src/bthread/task_group.cpp:124

4 0x0000000000693e2b in bthread::TaskGroup::run_main_task() () at src/bthread/task_group.cpp:152

5 0x0000000000685ddc in bthread::TaskControl::worker_thread(void*) () at src/bthread/task_control.cpp:81

6 0x00007fab73c7cea5 in start_thread () from /lib64/libpthread.so.0

7 0x00007fab73f8f96d in clone () from /lib64/libc.so.6``

PS: 通过brpc加log, 在ProcessSofaRequest函数中 有如下输出Fail to find method= sofa.pbrpc.builtin.BuiltinService.Health

疑问:sofa作为client,去请求brpc server,为什么会在server中检测sofa的Health函数呢

lorinlee commented 1 year ago

现有提供的信息没太看出什么问题,有其他的错误日志吗?最好是可以提供一个可复现的demo(比如一个简单的echo服务),比较好定位问题

namelij commented 1 year ago

现有提供的信息没太看出什么问题,有其他的错误日志吗?最好是可以提供一个可复现的demo(比如一个简单的echo服务),比较好定位问题

感谢回复,我尝试在brpc中增加LOG,输出了Fail to find method= sofa.pbrpc.builtin.BuiltinService.Health

所以,根据上述推论,大致猜测是因为 client端的 sofa框架健康监测失败导致不下发流量过来