Closed roelandp closed 6 years ago
Listen backlog is only an issue when there's very high traffic to a single listening port, causing queuing of inbound connections in the kernel. A single connection is not queued. System administrators deploying nodeos
should set listen queue depth to an appropriate value themselves. Applications should not attempt to override that value. We anticipate that very busy nodes will deploy traditional http load balancers, obviating much of the need to set higher queue depths on machines running nodeos
instances.
The comment in http_plugin.cpp
you reference is currently spurious. The http_plugin is using an application-wide instance of boost::io_context
(formerly known as boost::io_service
), which does not stop running until the application exits, assuming at least one of net_plugin
or http_plugin
are configured.
i found that my http service would not run on my node which also used to have Timer.expired (Bitshares / Steem -> localhost wallet connection) errors in the past. By modifying the code as per above it does work.
This is only for Kernel > 4.4. Please see this comment in the websocketpp (dev branch) fix commit:
After a change in Linux Kernel 4.4 the value of 0 causes all connections to be rejected rather than the default value being used. The default is now the asio::socket_base::max_connections value instead (which is the default asio uses when no value is provided).
https://github.com/zaphoyd/websocketpp/commit/0bb33e4bca4ccc42a36aa2321e4fb97f2562e519
And yeah i noticed that the comment ilog("http io service exit");
is shown always :)
I've been running nodeos/eosiod/eosd on a post-4.4 kernel since August and it never times out either net_plugin or http_plugin. Currently running on 4.13 with no issues. I suggest investigating your hosting provider's settings. For instance, I have
$ cat /proc/sys/net/ipv4/tcp_max_syn_backlog
1024
Your provider is defaulting to needlessly aggressive (and arguably wrong) settings if you're experiencing spurious SYN flooding warnings in the log and dropped packets.
so you can reach the API endpoints when visiting?
On 3 Apr 2018, at 01:46, jgiszczak notifications@github.com wrote:
I've been running nodeos/eosiod/eosd on a post-4.4 kernel since August and it never times out either net_plugin or http_plugin. Currently running on 4.13 with no issues. I suggest investigating your hosting provider's settings. For instance, I have
$ cat /proc/sys/net/ipv4/tcp_max_syn_backlog 1024
Your provider is defaulting to needlessly aggressive (and arguably wrong) settings if you're experiencing spurious SYN flooding warnings in the log and dropped packets.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EOSIO/eos/issues/2016#issuecomment-378083014, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPJALjBABdWBb6D9u9jgL7qbwhoehFEks5tkrhtgaJpZM4TC68-.
these ones I mean: http://mowgli.jungle3.eos.roelandp.nl:8765/v1/chain/get_info http://mowgli.jungle3.eos.roelandp.nl:8765/v1/chain/get_info
On 3 Apr 2018, at 01:47, RoelandP Lanparty dnaleor@gmail.com wrote:
so you can reach the API endpoints when visiting?
On 3 Apr 2018, at 01:46, jgiszczak <notifications@github.com mailto:notifications@github.com> wrote:
I've been running nodeos/eosiod/eosd on a post-4.4 kernel since August and it never times out either net_plugin or http_plugin. Currently running on 4.13 with no issues. I suggest investigating your hosting provider's settings. For instance, I have
$ cat /proc/sys/net/ipv4/tcp_max_syn_backlog 1024
Your provider is defaulting to needlessly aggressive (and arguably wrong) settings if you're experiencing spurious SYN flooding warnings in the log and dropped packets.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EOSIO/eos/issues/2016#issuecomment-378083014, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPJALjBABdWBb6D9u9jgL7qbwhoehFEks5tkrhtgaJpZM4TC68-.
@jgiszczak it is not about SYN flooding warnings and dropped packets. I have the same output for tcp_max_syn_backlog
I don't want to go into endless discussion really. I think the commit in the official next version of Websocketpp is pretty self explanatory. https://github.com/zaphoyd/websocketpp/commit/0bb33e4bca4ccc42a36aa2321e4fb97f2562e519
Maybe I am mistakenly mixing up http service
and the RPC API endpoint and is it not the same thing?
In my case the http api was not responding and idling out after by the 'browser timeout' When implementing the above fix it worked instantly.
I am talking about the NODEIP:NODEPORT/v1/chain/get_info - apis 💤
Lastly here another discussion about it: https://github.com/zaphoyd/websocketpp/issues/623
Afai understand from your comments you are way deeper into this, but I hope you can give it a shot help me understand how I should revert back to m_listen_backlog(0)
and what I alternatively I should change on my box. I still feel it is the correct fix as the maintainer of websocketpp admits the error and it is also updated in websocket 0.8 dev branch (EOS uses current latest release 0.7 (2016))
the m_listen_backlog(0) instructs apparently to use the default setting, but some kernels / boxes interpret it not to go 'default' but drop all. As this is kinda unpredictable they changed it in 0.8 dev branch to a new default in the code: max_connections, at least that is what I understand.
I left my comment here as sometimes this pops up with people I had seen in the past with other graphene chains, and it really is a pretty annoying bug which is unresolved in many cases because it is so deeply hidden in a submodule's library.
Also check the allowed maximum connections:
$ cat /proc/sys/net/core/somaxconn
128
From reading the linked ticket, also disable ufw if your system is using it. It seems to be hyperaggressive about something it shouldn't be. Compose your own firewall rules with iptables if you need them.
I've spent a good deal of time with strace today, and I'm not entirely sure how websocketpp has been working on most everyone's system, including mine. I see the 0
argument in the system call, and my perusal of both the kernel source and glibc source leads me to believe it will be passed unmodified. I am reluctant to just arbitrarily patch websocketpp, but fortunately we're using an actual copy of it rather than a git submodule, so it can be done easily.
Given the still somewhat mysterious nature of the root cause, I've submitted a pull request with the recommended fix.
@jgiszczak Please note the commit should also have added asio!
(as far as I understand from the patch from websocketpp: https://github.com/zaphoyd/websocketpp/commit/0bb33e4bca4ccc42a36aa2321e4fb97f2562e519)
#include <websocketpp/common/asio.hpp>
ATC:
Run `nodeos with strace from the build directory as follows:
strace -e trace=listen programs/nodeos/nodeos
Verify the following two lines appear:
listen(11, 128) = 0
listen(12, 128) = 0
@roelandp Adding the include was not necessary. The constant was already available and was being used by the copy constructor, line 134.
ATC passes
I fired up a dedibox follow all requirements and preferred systems so I expected a smooth sailing seeing so many running nodes already.
However then I discovered the webservice / http service was not able to respond and timed out after a while. I noticed that whenever I turned of nodeos that the http service would immediately respond with unavailable, whilst when the nodeos was running it was just 'connecting' for a while (30 secs or so) before idling out.
When I was browsing the source at https://github.com/EOSIO/eos/blob/d8db1d3a05e768f5459b46ace8d2bba92aab89d9/plugins/http_plugin/http_plugin.cpp#L218
It slowly started to remind me about a fix I found for the steem/graphene dreaded 'Timer Expired' error when trying to connect to localhost RPC for the wallet. https://github.com/steemit/steem/issues/35#issuecomment-315463930
This appeared to only happen on Kernels bigger then 4.4 .
So I manually adjusted the file
libraries/fc/vendor/websocketpp/websocketpp/transport/asio/endpoint.hpp
to reflect the fix of the websocketpp source (this is a fix taken from the 'dev version of websocketpp which is not the default included submodule version).L37 ADD:
#include <websocketpp/common/asio.hpp>
L95 (OR SOMEWHERE) REPLACE:
, m_listen_backlog(0)
with, m_listen_backlog(lib::asio::socket_base::max_connections)
This fixes the timeouts occuring with the HTTP Service on kernels > 4.4
I think you can also Cherry Pick the fix as summarized by Abit for Bitshares: https://github.com/bitshares/bitshares-core/issues/701