AntelopeIO / leap

C++ implementation of the Antelope protocol
Other
116 stars 68 forks source link

5.0.0-rc3: review if exiting cleanly when running out of RAM is best course of action #2023

Open matthewdarwin opened 10 months ago

matthewdarwin commented 10 months ago

My nodeos ran out of RAM..

Dec 29 03:02:15 wax-snap71c nodeos[2594144]: std::bad_alloc: Error unpacking field transactions
Dec 29 03:02:15 wax-snap71c nodeos[2594144]:     {"field":"transactions","what":"std::bad_alloc"}
Dec 29 03:02:15 wax-snap71c nodeos[2594144]:     net-8  raw.hpp:363 operator()
Dec 29 03:02:15 wax-snap71c nodeos[2594144]: error unpacking eosio::chain::signed_block
Dec 29 03:02:15 wax-snap71c nodeos[2594144]:     {"type":"eosio::chain::signed_block"}
Dec 29 03:02:15 wax-snap71c nodeos[2594144]:     net-8  raw.hpp:668 unpack

and then exited cleanly:

Dec 29 03:05:12 wax-snap71c nodeos[2594144]: info  2023-12-29T03:05:12.914 nodeos    http_plugin.cpp:515           plugin_shutdown      ] exit shutdown
Dec 29 03:05:12 wax-snap71c nodeos[2594144]: info  2023-12-29T03:05:12.916 nodeos    main.cpp:155                  operator()           ] nodeos version v5.0.0wax01-rc3 v5.0.0wax01-rc3-10c04d06485c8a1fdb52f6036dfa90b93813cb3a-dirty
Dec 29 03:05:12 wax-snap71c nodeos[2594144]: info  2023-12-29T03:05:12.916 nodeos    main.cpp:62                   log_non_default_opti ] Non-default options: [.........]
Dec 29 03:05:52 wax-snap71c systemd[1]: nodeos.service: Main process exited, code=killed, status=11/SEGV

I am not convinced that the state of the world is 100% accurate in this situation. Would it be better to not exit cleanly instead?(force user to revert to well known state)

For me, I have rolled back to earlier snapshot to avoid any doubt.

bhazzard commented 10 months ago

Thank you for the suggestion, we will discuss. We won't consider this a blocker for 5.0.0 stable.

heifner commented 10 months ago

In general the strategy is to just exit (non-clean shutdown): https://github.com/AntelopeIO/leap/blob/6d5248659b40a0ae9723c032cd4aec9181168318/plugins/chain_plugin/chain_plugin.cpp#L1273-L1277 This must be a case where we are not catching and handling std::bad_alloc appropriately.