Open chfast opened 7 years ago
This assertion is thrown because as part of the trie insertion process a trie hash reference is reached to a non-existence node. Still trying to figure out how it reaches this point.
Hey @dirtyfilthy, thanks for trying to help with this.
As these asserts are not failing again after immediate restart, I suspect that it's not the bug in the trie logic itself, but some data races around accessing the state database. So this is hard to track down, I'm not sure yet how to proceed.
MemoryDB
& OverlayDB
- classes representing state database - even have locks, wich were #ifdef
ed out at some point. One experiment I have in mind is to try to enable them back and see if it helps with these asserts.
One experiment I have in mind is to try to enable them back and see if it helps with these asserts.
I tried that, it didn't help
(Anyway OverlayDB
is definitely not thread-safe currently and used from different threads)
I've added retrying GenericTrie::insert()
in case of this assert failure - second attempt failes, too.
This means underlying OverlayDB
really contains invalid trie at the moment, i.e. this is kind of evidence against data races during access to this node (but may be there was a race at the previous insert)
I want to try dumping contents of MemoryDB
at the moment of failure and making a test out of it.
Ok, I've got enough evidence that this is the same Too many open files
error as in https://github.com/ethereum/cpp-ethereum/issues/4493
What happens here is that DB lookup fails because of this error, then OverlayDB::lookup()
considers any DB failure to be "key not found" (this should be fixed)
https://github.com/ethereum/cpp-ethereum/blob/976309b3f559fb2973e885dee65b39652e8a2a29/libdevcore/OverlayDB.cpp#L133
Then to the TrieDB
on top of OverlayDB
it looks like a corrupted trie (some record missing in the database) and asserts fail