TrueBlocks / trueblocks-core

The main repository for the TrueBlocks system
https://trueblocks.io
GNU General Public License v3.0
1.04k stars 196 forks source link

chifra export parallelizing reconciliations #1961

Open tjayrush opened 2 years ago

tjayrush commented 2 years ago

We're finding that reconciliations are the slowest part of a full extraction (as we already knew).

One of the reasons for this is that reconciliations are sequential. We need the calculated balance at the end of one transaction before we can reconcile the current transaction.

This has two consequences.

  1. It forces sequential reconciliation,
  2. It makes backward reconciliation difficult (so we can't really show a reverse chronological view)

Both of the problems are made easier if we break reconcilations into two paths:

First, reconcile the internals of a transaction nodeBegBal + income - outflow == nodeEndBal (this should always reconcile, so it's really only a reading of the values from the chain). We can do this step in a highly parallel way as each transaction's internal reconciliation is independent of every other.

The second pass is to reconcile beginning balances of one transaction with the previous transaction's ending balance (or, if going backwards, the current transaction's ending balance with the next transaction's beginning balance).

This solves both problems and should greatly speed up the reconciliations.

We could even go so far as to do this in batches, say by month heading backwards, so we can deliver the results to the front end in a more timely way.

tjayrush commented 1 year ago

Also, we can write the reconciliations to cache on the "first pass calculate internal reconciliation" and then in the pipeline that handles inter-transaction reconciliation, read from cache. In this way, we don't have to store the entire thing in memory which means we can, in effect, stream.