RFC: Block persistence changes to support headers-first sync

This describes some options for implementing headers-first syncing and how it relates to block persistence. The new blockindex and chainview structures in the blockchain package go a long way towards enabling this, but there are some more possible changes that would make it for more efficient.

First of all, the block index is currently only initialized with nodes from the main chain, partly due to how block data is structured in buckets. Instead, we should efficiently load all stored headers into the block index on init. If we ensure that we only store connected headers that satisfy PoW and checkpoint checks, there is very little risk of populating the in-memory index with a ton of invalid or orphaned blocks. As demonstrated in https://github.com/btcsuite/btcd/pull/1034, having a bucket that stores all block headers keyed by <uint32(height><hash>, allows the block index to be populated very quickly.

Each blockNode in the index should store a bitfield representing its validation state. To support headers-first syncing, we should store with each node: if we have the block data in ffldb, if the block was fully validated, if the block failed validation, and if the block is a descendent of another invalid block. More bits may be used in the future, for example if pruning is implemented, we may want to store whether block info has been pruned from the spend journal. This bit field can be stored alongside the header.

While it is possible to implement headers-first sync with the headers only in memory, they would have to be resynced if the node restarts during IBD. Instead, it seems preferable to store headers independently of block data.

I propose using ffldb to only store/fetch full blocks and creating a bucket with headers & validation state managed by in chainio.go. To save storage space, we would drop the block headers from the block index bucket in LevelDB and each row in the bucket would just have the block location. A one-time migration is necessary to populate the new bucket and drop header data from the current block index bucket.

I'm sure there are some complications I am not aware of, so I appreciate any feedback.

Thanks for taking the time to write all this up. I know we've pretty much discussed this already between various PRs and IRC, but it's good to have it all in one place.

I think moving the management of the block index into blockchain and adding a field of bitflags for tracking the validation state is definitely the way to go here. As mentioned, we are certainly going to need the ability to do a few things that really aren't very easy to do right now with the block index being managed by database. In addition to the aforementioned cases, I can foresee some additional flags identifying specific portions of the verification process that have been done and performing multi-block validation in parallel. Obviously there are a lot of considerations in something like that due to the ordering requirements, but nevertheless there are a lot of opportunities there.

One of the considerations of removing the headers from the block row is that the header fetching functions in database will also need to either be changed to load them off disk, or perhaps even just removed completely since FetchBlockRegion(s) could be used in that case. I'm leaning towards the latter since, short of the initial load, which will be done from a separate blockchain bucket with this change, there is no longer ever a need to actually hit the database to get a header. It could affect anyone interacting directly with the database via the database package though.

With this change, it should also be possible to completely remove the current height<->hash mapping buckets which only contain entries for the main chain. The code is already almost able to do this thanks to the recent change of having all block nodes in memory and a chainview to track the best chain. This change would provide the final piece to be able to remove them.

This next bit is some of what I posted on the aforementioned PR, but I think it's instructive to include it in this issue for reference purposes as well.

One of the things I've been periodically working on is implementing a utxo cache, which will massively speed up IBD and block validation times at tip, however, one of the challenges there is that some parts of blockchain still believe the database always represents the best chain. My recent chainview work helped solve a lot of these issues (and even was a primary motivator for it), but the block index is an area where this is still problematic, in large part because it's managed by database and not blockchain.

This proposal would help simplify that work tremendously. However, there is another important consideration that I didn't see mentioned here and that is one of making sure the block index is a implemented such that it only needs to be periodically flushed to the database. Either that, or some type of reconciliation code that can detect differences and replay utxos to ensure the on-state representation of the utxo set and the best chain are in agreement. Currently, that isn't a problem because it's all updated atomically as each block is connected and disconnected. However, that will no longer be the case with all of these changes + caches.

While it's probably a bit premature to discuss now as I think it would need to be a multi-stage process if it's something we even would want to do, but looking even further into the future, I highly suspect the model of the database package might need to be reconsidered. It is certainly nice to abstract away all of the atomicity concerns and provide the potential for swapping out the backend, but, in practice, my performance tests have shown that, due to the fact leveldb is really just a KV store, including everything in the same database (spend journal, utxo set, block index, address index, transaction index, etc) suffers a less than trivial loss of performance, particularly due to the massive numbers of utxo entries added and removed constantly. A utxo cache will certain help that somewhat by drastically reducing the number of entries ever required to be written to the database, but I still think we could do a lot better.

This all makes sense except the part about the block index: "but the block index is an area where this is still problematic, in large part because it's managed by database and not blockchain." What does this refer to? The blockIndex struct or something else? Can you elaborate on the issue?

The issue is essentially that the utxoset and the block index (currently it's more specifically the best chain) have to be consistent with one another. When introducing a utxo cache, it necessarily means the state of the in-memory representation of the utxoset diverges until it eventually gets flushed. That also implies that the block index needs to be atomically updated during that same flush (or more specifically the best chain, but with this proposal, the connection status of each block will be a bit in the validation state, so it will apply to the modified block index nodes too).

Right now, because the block index is managed by database, it has to be updated every time a block is connected and the current best chain is also updated at that same time. As mentioned, most of this is mitigated by the chainview since the actual block index now is all nodes, while the view represents the best chain. However, due to those last remaining bits where the code still believes the database (ergo the on-disk block index) to be the source of truth in terms of the latest best chain, it leads to a situation where they are not in sync without also introducing a caching block index that flushes simultaneously. As mentioned above, introducing a validation state on the block index nodes (which I really do think needs to happen), also means that there will need to be some type of modified flag (probably a bit flag in the validation state) so that the states of each block in the block index all stay in sync with what the best chain and accompanying utxoset for that point are and then can be atomically flushed to disk.

EDIT: Alternatively, it would be possible to save the hash of what the utxo set believes is the best chain and compare that to the best chain according to the block index, then "replay" everything to get them back in sync. However, that seems like it might be quite easy to get out of sync and certainly would increase startup times in the event of an unclean shutdown.

btcsuite / btcd

RFC: Block persistence changes to support headers-first sync #1048