Explore more efficient data loading.

What was wrong?

The ORM data model is setup with the following loose constraints.

A Header has a nullable foreign key to it's parent
A Block must point to a header
A Transaction optionally point to a block.
A Receipt must point to a transaction
A Log must point to a receipt

Currently, to import a block we build and bunk save this entire hierarchy for a single block. Each block is imported sequentially and cannot be done concurrently due to the foreign key constraint to the parent block.

However, since Headers can have a null parent and transaction can have a null block, we should be able to add a level of concurrency for improved efficiency of data loading.

How can it be fixed?

We should be able to adjust our pipeline such that:

We load the "Transaction < Receipt < Log" sets concurrently
We load the "Header < Block" data concurrently with all headers having a null parent pointer.
We link the "Block < Transaction" concurrently (once both sides have been loaded)
We link the "Header" to it's parent sequentially once all of the above have been executed.

Before doing this we need some benchmarks in place to measure performance. I would suggest we benchmark against a wide range of real mainnet blocks.

ethereum / cthaeh

Explore more efficient data loading. #8

What was wrong?

How can it be fixed?