decred / dcrd

Decred daemon in Go (golang).
https://decred.org
ISC License
739 stars 291 forks source link

Add high performance blockchain streaming endpoint #2491

Open matheusd opened 3 years ago

matheusd commented 3 years ago

This is an idea for improving performance of services that process the historical blockchain.

Right now, in order for an external service to process the entire historical chain when connected to dcrd, it needs to request every block via the json-rpc endpoint, using code somewhat similar to (very simplified):

for height := 0; height < tip; height++ {
  blockHash := dcrdClient.GetBlockHash(height)
  block := dcrdClient.GetBlock(blockHash)
  // Process block...
}

Some informal profiling shows that about 28% of the time of such processing loop is spent encoding the request and decoding the response to/from the JSON encoding used in the standard rpc client.

That's not terribly efficient, specially considering a serialized block returned in a response is encoded twice: first the serialized block bytes are converted to an hex string, then that hex string is embedded into a json object.

My proposal is to offer a new, high performance endpoint that can stream the entire historical blockchain to clients.

This would be offered as an optional https interface that could be specified via a new config option in dcrd. This interface would respond to a single query, such as /streamBlocks?start=[start-hash]&end=[end-hash]. Crucially, instead of going through the json-rpc encoding, the blocks would be simply written as raw bytes to the client socket in a streaming fashion, following the ascending blockchain order.

Clients would be able to continuously decode blocks, as fast as they can process them (and the network can handle the streaming of blocks) without having to go through the repeated decoding and unmarshalling from json.

This would offer a significant performance improvement for the initial block processing of dcrros, possibly for dcrdata and any other future services that go through the blockchain indexing it in some way.

davecgh commented 3 years ago

We discussed this a bit offline before opening the issue and I have no objections to it. We might want to put it under /raw/streamBlocks though so if we ultimately end up wanting any other raw data, it would fit nicely as /raw/foo.