bitshares / bitshares-core

BitShares Blockchain node and command-line wallet
https://bitshares.github.io/
Other
1.17k stars 647 forks source link

Witness_node randomly stops syncing #2798

Open abitmore opened 9 months ago

abitmore commented 9 months ago

Bug Description

Start a new witness_node instance and wait, sometimes it hangs during syncing (unable to sync to latest block). Restarting works sometimes.

See https://github.com/bitshares/bitshares-core/issues/2798#issuecomment-1855022095 for more info.

Unable to stably reproduce so far. Haven't found very interesting info in log files yet. Probably caused by memory corruption.

Impacts Describe which portion(s) of BitShares Core may be impacted by this bug. Please tick at least one box.

Host Environment Please provide details about the host environment. Much of this information can be found running: witness_node --version.

CORE TEAM TASK LIST

abitmore commented 9 months ago

Found the reason.

In PR https://github.com/bitshares/bitshares-core/pull/2791 for the last releases (7.0.1 and test-7.0.3), we updated read_write_handler and read_write_handler_with_buffer to throw canceled_exception when a boost::asio::error::operation_aborted error occurs.

The canceled_exception might then be caught in message_oriented_connection_impl::read_loop(): https://github.com/bitshares/bitshares-core/blob/0aa383154ce191d708f7d143b27662bc171ed239/libraries/net/message_oriented_connection.cpp#L215-L230

as a result, call_on_connection_closed is no longer set to true, so node::on_connection_closed() is not called as before, which means that certain cleanup steps are not performed.