apache / brpc

brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" means "better RPC".
https://brpc.apache.org
Apache License 2.0
16.56k stars 3.98k forks source link

Error "Fail to wait EPOLLOUT of fd=...: Connection timed out" when channel timeout is -1 #2816

Open vaavaav opened 2 weeks ago

vaavaav commented 2 weeks ago

Describe the bug (描述bug)

According to channel.h, setting timeout_ms to -1 will make the Channel block on requests. However, this fails and shows the following error over and over again (even if no more requests are being made): W1106 15:03:26.337735 3784702 4294969859 /.../brpc/src/brpc/socket.cpp:1361] Fail to wait EPOLLOUT of fd=3: Connection timed out .

However, this does not happen if the IP provided for the channel to connect is "localhost" more precisely, "127.0.0.1".

To Reproduce (复现方法)

For example,

  1. Compile the echo_c++ example provided by brpc (but comment out all logging instructions for easier reading).
  2. Run it like so: ./echo_client --timeout_ms=-1 --server="128.0.0.1:50000". (I am aware that this IP is not for "localhost" but, again, this fails for IPs that are not "localhost").

Expected behavior (期望行为) The request should be blocked until an answer is received. An example of correct behavior is following the example above but run it like so: ./echo_client --timeout_ms=-1 --server="127.0.0.1:50000".

Versions (各种版本) OS: 5.4.0-187-generic Compiler: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 brpc: 1.11.0 protobuf: 3.21.6.0

Additional context/screenshots (更多上下文/截图) Resulting code portion after commenting out logging calls in the echo_c++ example: image

chenBright commented 2 weeks ago

You also need to set connect_timeout_ms to -1, but echo_client does not provide this gflag setting. You can modify the code of echo_client, like: options.connect_timeout_ms = -1.

vaavaav commented 2 weeks ago

Strangely enough, I was doing that on the project I was actually working on and had the exact same problem. However, it is true that in the example above, setting options.connect_timeout_ms to -1 avoids this problem. Thanks for the help; I'll see if this was, in fact, a problem on my side.