Open ericpassmore opened 1 year ago
No luck with other options. Testing with smaller sync span, and tested on 4.0.4. Still crashing without logging ending integrity hash.
I tried this with release/5.0
using your config.ini and it didn't segfault.
Closing this issue. First I am unable to get a core dump. After changing ulimit its reporting segmentation fault, and no core dump. Second due to lack of root privileges I have unpacked nodoes locally by expanding the deb. Therefore this isn't a standard install.
reopening to continue investigation, will attach a debugger.
Reliably getting errors with 5.0.0-rc2
. Attached gdb
and included a stack trace from the treads at the time of nodeos crash.
Leap 5.0.0-rc2 Stack-Trce-Stop-Hash-Failure.txt Leap 5.0.0-rc2 Nodes-Log-Stop-Hash-Failure.log
For Comparison here is the nodeos log lines without the --integrity-hash-on-stop
option, and a successful exit.
Leap 5.0.0-rc2 OK-Log-Without-Stop-Hash-Option.log
Stack trace from Leap 5.0.0-rc3 slightly different Leap 5.0.0-rc3 Stack-Trce-Stop-Hash-Failure.txt
Stack trace from Leap 4.0.5. Nodeos-4.0.5-Stack-Trce-Stop-Hash-Failure.txt
clear_expired_input_transactions
is calling a method on controller
, but controller
is being destroyed at this point. Seems like we should move the calculate_integrity_hash()
out of the ~controller_impl()
destructor and into chain_plugin shutdown.
clear_expired_input_transactions
is calling a method oncontroller
, butcontroller
is being destroyed at this point. Seems like we should move thecalculate_integrity_hash()
out of the~controller_impl()
destructor and into chain_plugin shutdown.
Or maybe we add a bool parameter to add_to_snapshot(.., at_shutdown = false)
so that when called at shutdown we don't call clear_expired_input_transactions
?
It is kinda weird that the fact this log statement:
ilog( "chain database stopped with hash: ${hash}", ("hash", calculate_integrity_hash()) );
in ~controller_impl()
has the side effect of writing to the snapshot and clearing expired transactions.
clear_expired_input_transactions
is required.
Yeah, if you enable warn
or error
level logging then the integrity hash generation on start/stop doesn't work. We should fix that also.
Summary: The issue comes into play when outputting the integrity hash on start and stop...
--integrity-hash-on-start
--integrity-hash-on-stop
It needs to be fixed in affected versions, and the feature was added in 3.2.
Fix in 3.2.6, 4.0.6, and 5.0.x, and main/6
High priority for 5.0 because it would block chicken dance.
It should be fixed in 3.2.x and 4.0.x, but won't on its own necessitate a release.
Chicken dance is not blocked. There is a work around already in place.
fwiw so far I have been unable to reproduce the crash nor can I get any tooling to report an invalid access. That's somewhat a problem in creating a test case to show the problem is fixed. Wonder what I'm missing.
clear_expired_input_transactions
is calling a method oncontroller
, butcontroller
is being destroyed at this point.
controller
or controller_impl
? The only actions that have been taken on controller_impl
at that time are,
https://github.com/AntelopeIO/leap/blob/f7d7d2ff188cde76d13e1af2d02c7027706956cc/libraries/chain/controller.cpp#L767-L773
and pending.reset();
is pretty much a noop since it was just done anyways,
https://github.com/AntelopeIO/leap/blob/f7d7d2ff188cde76d13e1af2d02c7027706956cc/libraries/chain/controller.cpp#L2769-L2770
I'm not immediately seeing anything in clear_expired_input_transactions()
that accesses something now invalid? All the members of controller_impl
are still intact other then pending
which is well handled as part of is_building_block()
.
Yeah, if you enable
warn
orerror
level logging then the integrity hash generation on start/stop doesn't work. We should fix that also.
What are you suggesting? There isn't really anything to do other then log it.
What are you suggesting? There isn't really anything to do other then log it.
Maybe it gets its own non-default logger?
self
in controller_impl::clear_expired_input_transactions
is controller&
maybe that is not valid at this point.
What are you suggesting? There isn't really anything to do other then log it.
Maybe it gets its own non-default logger?
I mean we need this:
if(okay_to_print_integrity_hash_on_stop && conf.integrity_hash_on_stop) {
auto hash = calculate_integrity_hash();
fc::logger::get(DEFAULT_LOGGER)).log( FC_LOG_MESSAGE( info, "chain database stopped with hash: ${hash}", ("hash", hash) ) );
}
Otherwise if a user has default logger configured for warn
or error
they will not get output even if they ask for --integrity-hash-on-stop
.
Note:start
group: STABILITY
category: BUG
summary: Fixes bug and prints final hash when specifying the option --integrity-hash-on-stop
. Previously a race condition in the shutdown logic prevented the hash from being printed.
Note:end
Nodoes terminates before integrity hash is reported. The command line reports segmentation fault(core dump) but I'm unable to find the core dump on the host.
Running nodeos 5.0.0-rc2 , the config will sync with peers and terminate at end block. Snapshot starts at block num 323784127 and end block is 323956925. sync-config.ini may be found here. Snapshot is v6 from eos nation.
Last lines of the log file
Next steps, going to run this with a much smaller sync span, and try with nodeos v4.0.4.