apache / brpc

brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" means "better RPC".
https://brpc.apache.org
Apache License 2.0
16.05k stars 3.92k forks source link

server中brpc::StreamWait卡在了thread::TaskGroup::sched_to #2680

Open GreateCode opened 4 days ago

GreateCode commented 4 days ago

Describe the bug (描述bug) server给client端发送几十个G的数据,期间client挂。 StreamWait的due_time设置的是100ms(是错的,应该是时间点),但也不应该卡住吧。 client挂和StreamWait卡住先后顺序不确定。 栈信息如图,请大佬帮忙看看是什么原因。

while(1) {
     int ec = brpc::StreamWrite(xxxx);
     if (ec == EINVAL) { return; }

     if (ec == EAGAIN) {
          auto ret = brpc::StreamWait(stream_id, &due_time);
          if (ret == EINVAL) { return; }
     }
}

To Reproduce (复现方法) 极难复现。

Expected behavior (期望行为)

Versions (各种版本) OS:ubuntu 20.04 Compiler:clang brpc: 1.8.0 protobuf:3.15

Additional context/screenshots (更多上下文/截图)

image

GreateCode commented 3 days ago

@chenBright @wwbmmm 大佬~

GreateCode commented 3 days ago

@jamesge

chenBright commented 1 day ago

StreamWrite没有返回网络错误,一直返回EAGAIN的话,应该是client挂了,但是server并没有感知到tcp连接断开,发送的数据没有收到client的ack,随后写不进去内核缓冲区后,就一直返回EAGAIN。

GreateCode commented 1 day ago

有道理,这样的话,一定时期内都是EAGAIN,while里就可以判定连接断开,然后return结束,避免hang在这里。 但是server的连接资源怎么释放?如果sever配置了ServerOptions.idle_timeout_sec=100, 那么超过100s后,server会把该连接释放吗?