ethereumjs / ethereumjs-monorepo

Monorepo for the Ethereum VM TypeScript Implementation
2.57k stars 750 forks source link

Trie: Critical DB-Consisteny Bug #3264

Open TimDaub opened 7 months ago

TimDaub commented 7 months ago

here's what I just wrote down:

https://github.com/attestate/kiwistand/issues/131

I have reported this to Gabriel before but not sure if y'all have looked into this. My suspicion is that somehow the ethereumjs library throws or anyways doesn't finish a write somehow. Or that maybe I kill the ethereumjs execution and then this leads to inconsistency. But in anycase, IMO, I should never manage to get any irrecoverable state from ethereumjs/trie. Happy to answer any questions

TimDaub commented 7 months ago

btw I made a backup of that corrupted state in the database. bootstrap.zip

you can basically run it with these options and it should help you reproduce it

gabrocheleau commented 7 months ago

Thanks for reporting this and coming up with a backup and reproduction scenario.

I'll see if I can reproduce this locally and get back. Workload is high currently with ongoing developments, but this is something we need to get to the bottom of as it could perhaps reveal more general issues that go beyond the scope of your implementation.

gabrocheleau commented 7 months ago

So to recap and confirm my understanding, here's what I'm getting from the issue:

Obviously, this should not be happening. However it is hard for me to get a good grasp on the actual issue since it happens in the context of external usage that I have limited familiarity and understanding of.

From the issue you've linked, I see these lines in the logs:

2024-02-01T00:00:58: 2024-02-01T00:00:58.967Z @attestate/kiwistand Stored message with index "65badc1d8b88b49217fda6abdb764eb69cf239d8d8e625eb0737f28762efa39bd9c8c727"
2024-02-01T00:00:58: 2024-02-01T00:00:58.967Z @attestate/kiwistand New root: "e49c93db1dd3c722

And before that, I see this error:

024-02-01T00:00:58: Message: put: Didn't add message to database
2024-02-01T00:00:58: Stack Trace: Error: add: Didn't find root message of comment
2024-02-01T00:00:58:     at Module.add (file:///root/kiwistand/src/store.mjs:361:13)
2024-02-01T00:00:58:     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
2024-02-01T00:00:58:     at async put (file:///root/kiwistand/src/sync.mjs:332:7)
2024-02-01T00:00:58:     at async file:///root/kiwistand/src/sync.mjs:444:7

I'm not sure what exactly the expected order of operations, but might this be a race condition where the root is set independantly (e.g. through the trie.root() method) without the put operation actually updating the trie, therefore resulting in a root that is inconsistent with the underlying trie? If the initial issue is server/networking-related (this seems to be the case from the logs, but I might be wrong), would it work to just reset the trie root to the previous one in case of a failure in the put operation, and retry the full operation? Our Statemanager package would likely provide some logic that can serve as an inspiration for that.

If I'm misunderstanding the issue, would you be able to provide a relatively minimal reproductible example (that can be run natively in the context of our repository). Then I'd be able to troubleshoot further, and generate test cases.