Open the-smooth-operator opened 6 days ago
@the-smooth-operator, When was the first observed occurrence of this?
Can you share:
This started happening on Friday 27th of October. .Memory limit is setup at 32GB however the process rarely uses it all. We have bumped to 64GB and saw this happening again
config.ini
p2p-peer-address = eos.seed.eosnation.io:9876
p2p-peer-address = peer1.eosphere.io:9876
p2p-peer-address = peer2.eosphere.io:9876
p2p-peer-address = p2p.genereos.io:9876
# Plugins
plugin = eosio::chain_plugin
plugin = eosio::net_plugin
plugin = eosio::http_plugin
plugin = eosio::chain_api_plugin
plugin = eosio::net_api_plugin
# Stability Configs
http-validate-host = false
chain-state-db-size-mb = 89200
# Default settings are way too low
net-threads = 6
http-threads = 6
chain-threads = 6
Most recent logs previous to the error:
2024-10-04 16:52:27.164
debug 2024-10-04T14:52:27.164 net-1 net_plugin.cpp:3816 operator() ] validated block header, broadcasting immediately, connection - 1, blk num = 397510466, id = 17b18742b890a90bc75be74baacc27f278f03bdef6e083c5edc8e8627937149b
2024-10-04 16:52:27.164
debug 2024-10-04T14:52:27.164 net-1 net_plugin.cpp:2694 bcast_block ] bcast block 397510466
2024-10-04 16:52:27.164
debug 2024-10-04T14:52:27.164 net-1 net_plugin.cpp:3822 operator() ] posting block 397510466 to app thread
2024-10-04 16:52:27.169
debug 2024-10-04T14:52:27.169 net-1 controller.cpp:4215 integrate_received_q ] set received qc: {"block_num":397510466,"is_strong_qc":true} into claimed block 397510466 17b18742b890a90bc75be74baacc27f278f03bdef6e083c5edc8e8627937149b
2024-10-04 16:52:27.169
debug 2024-10-04T14:52:27.169 net-1 fork_database.cpp:353 set_pending_savanna_ ] set pending savanna lib 397510465: 17b18741d53e330193d5409116c4e5476e81b2c914893a1a701363970822fa54
2024-10-04 16:52:27.169
debug 2024-10-04T14:52:27.169 net-1 net_plugin.cpp:3816 operator() ] validated block header, broadcasting immediately, connection - 1, blk num = 397510467, id = 17b187436f3e0319a0842aad6f182acc78f37a2f145fd1a1197857a8a25693c4
2024-10-04 16:52:27.169
debug 2024-10-04T14:52:27.169 net-1 net_plugin.cpp:2694 bcast_block ] bcast block 397510467
2024-10-04 16:52:27.169
debug 2024-10-04T14:52:27.169 net-1 net_plugin.cpp:3822 operator() ] posting block 397510467 to app thread
2024-10-04 16:52:27.175
debug 2024-10-04T14:52:27.175 net-1 controller.cpp:4215 integrate_received_q ] set received qc: {"block_num":397510467,"is_strong_qc":true} into claimed block 397510467 17b187436f3e0319a0842aad6f182acc78f37a2f145fd1a1197857a8a25693c4
2024-10-04 16:52:27.175
debug 2024-10-04T14:52:27.175 net-1 fork_database.cpp:353 set_pending_savanna_ ] set pending savanna lib 397510466: 17b18742b890a90bc75be74baacc27f278f03bdef6e083c5edc8e8627937149b
2024-10-04 16:52:27.175
debug 2024-10-04T14:52:27.175 net-1 net_plugin.cpp:3816 operator() ] validated block header, broadcasting immediately, connection - 1, blk num = 397510468, id = 17b18744ca34b7a51ef26133f8f4fb7b8d4cc79f481b6d48a86ca043f024b608
2024-10-04 16:52:27.175
debug 2024-10-04T14:52:27.175 net-1 net_plugin.cpp:2694 bcast_block ] bcast block 397510468
2024-10-04 16:52:27.175
debug 2024-10-04T14:52:27.175 net-1 net_plugin.cpp:3822 operator() ] posting block 397510468 to app thread
2024-10-04 16:52:27.180
debug 2024-10-04T14:52:27.179 net-1 controller.cpp:4215 integrate_received_q ] set received qc: {"block_num":397510468,"is_strong_qc":true} into claimed block 397510468 17b18744ca34b7a51ef26133f8f4fb7b8d4cc79f481b6d48a86ca043f024b608
2024-10-04 16:52:27.180
debug 2024-10-04T14:52:27.179 net-1 fork_database.cpp:353 set_pending_savanna_ ] set pending savanna lib 397510467: 17b187436f3e0319a0842aad6f182acc78f37a2f145fd1a1197857a8a25693c4
2024-10-04 16:52:27.180
debug 2024-10-04T14:52:27.180 net-1 net_plugin.cpp:3816 operator() ] validated block header, broadcasting immediately, connection - 1, blk num = 397510469, id = 17b1874532e7a2634c13952bf9be08bd4db05c44606e51b9c3cd3021d380be84
2024-10-04 16:52:27.180
debug 2024-10-04T14:52:27.180 net-1 net_plugin.cpp:2694 bcast_block ] bcast block 397510469
2024-10-04 16:52:27.180
debug 2024-10-04T14:52:27.180 net-1 net_plugin.cpp:3822 operator() ] posting block 397510469 to app thread
2024-10-04 16:52:27.605
debug 2024-10-04T14:52:27.605 net-2 net_plugin.cpp:2244 sync_timeout ] ["peer1.eosphere.io:9876 - 922061b" - 3 198.50.156.24:9876] sync timeout
2024-10-04 17:10:49.718
terminate called after throwing an instance of 'Serialization::FatalSerializationException'
Since the spring upgrade, we have observed that our nodes suddenly stop responding to RPC calls and stop emitting normal logs to only show
terminate called after throwing an instance of 'Serialization::FatalSerializationException'
each ~30minutes:Furthermore, when restarting the process (it restarts gracefully) it always comes up with a corrupted state, so we need to start it from a snapshot, or perform a long replay operation.
While the process is "stuck" there's network activity in the host and the process is still consuming it's memory and using CPU cycles. However disk operations drop to 0.