dan-da commented 5 months ago

144 has demonstrated that it is possible for the blockchain to become invalid (with a different genesis block) and yet neptune-core can be restarted without detecting any problem.

I was curious why this is, so I read through the initialize() fn carefully and found that it doesn't do any validations of the blocks on-disk. Rather, it simply instantiates ArchivalState and trusts it.

So then, the question in my mind becomes how should we go about validating blocks, as quickly as possible? I found this description of how bitcoin-core does it:

I have benchmarked the client's startup process. Most of the time is spent validating the database to ensure the client has a sensible view of the hash chain.

It is "fast verified" up to the most recent checkpoint and then "slow verified" to the current block. Fast verification entails checking only that the header is valid. Slow verification is a much more complete verification that can only be done at about 20 blocks per second on hardware like yours. (Upgrading the client will help, since newer version of the client will have more recent checkpoints.)

If you passed the -rescan option, the client will check every transaction in every block to see if it relates to any account in your wallet. That will add some time to the client's startup time.

When you run the Bitcoin client for the very first time, it will take several hours to sync up to the network. The downside of a decentralized system is that you cannot trust anything and must check everything yourself. This literally requires you to fully verify every Bitcoin transaction that has ever taken place. I've seen that take 9 hours on Pentium 4 class computers with 100Mbps Internet connections. (Though I think it should be a bit faster now, thanks to bugfixes in the client.)

note: That comment is quite old so some details may have changed, but I think its still generally correct.

Now if I understand correctly we may be able to verify that the tip is valid given only the genesis block. eg, a light-state node. So that's pretty good, and that validation would have caught the #144 issue. However, it would not detect if an intermediate block has an invalid header. For that, I believe we must run through all the headers. Also, it seems there should be a mode/command to rescan and verify all the block-body as well.

So I can see three validation modes for an archival node:

validate tip only
validate headers of all blocks (possibly using checkpoints)
validate header+body of all blocks

So what have I missed or gotten wrong? What is the least validation we can perform to be certain we are operating with correct data? (or at least as certain as bitcoin-core is?)

dan-da commented 5 months ago

replying here to this comment in #144.

I disagree that we should be detecting the problem. The client assumes that the database and disk content is correct. The policy is that if it's not correct, it doesn't get stored.

trust, but verify. not: trust, and hope for the best.

Even if it was stored correctly, it could've been modified in the meantime, perhaps by a malicious program. And the startup validation is also a double-check that it was stored correctly.

I'm not saying the validation must be implemented right now necessarily, but i do believe it will be important/necessary eventually for people to have trust in the system's soundness. (including us as developers).

Sword-Smith commented 4 months ago

On all three common operating systems (Windows, Linux, and OSX), the underlying files (blocks and databases) will be owned by the user running Neptune Core. So that protects against other users on the same machine maliciously changing block data. The file system itself along with the operating system also protects against some degree of data changes due to software or hardware problems.

Digging more into what Bitcoin Core is doing, it seems that it checks the last six blocks, but assumes that everything else on the disk is integral. See the last comment in this thread.

What about the underlying database that we are using, leveldb? It has checksums associated with all its data. We could use these checksums to its integrity.

Sword-Smith commented 4 months ago

Also, with recursion, we actually get a full check of the entire blockchain state by just verifying the last proof. So if you would want to protect your view of the blockchain against these freak bit-flipping occurrences, it should be enough to verify the proof of the last block and then verify that that matches your mutator-set accumulator and whatever else block information may be relevant. There might be parts of the archival state (individual nodes in MMRs) that are not covered by these checks, but for this integrity, a leveldb checksum should suffice.

TL;DR: With recursion (checking proof of tip block on init) and leveldb checksums, all relevant cryptographic data should be verified. There could still be a bit flipped in a stored block but that would only be relevant when the block was shared with other archival nodes, which would then reject the shared block, as the block's proof would be invalid.

Neptune-Crypto / neptune-core

Discussion: (How) should we validate blockchain at startup? #146

144 has demonstrated that it is possible for the blockchain to become invalid (with a different genesis block) and yet neptune-core can be restarted without detecting any problem.