Open OttoAllmendinger opened 5 years ago
I had thought about a sightly different approach to transaction compression, but I don't think this would work with ~orphan~ stale blocks. However:
Since we'll use a SQL based storage I'd allow the txid
field in the Transaction
SQL-table to be NULL
and have a foreign key blockHeight
and a integer field indexInBlock
indicating the block position, which can be NULL
as well. The process would be the following:
INSERT
would put the transaction in the table with the txid
set. blockHeight
and indexInBlock
would both be ǸULL
. There should be a database index over the txid
.bitcoin-cli getblockhash {current_block_height - 144}
and then bitcoin-cli getblock {hash} 0
and then deserialize) would insert a new block into the Block
SQL-table. Each transaction in that block would then be updated (remove txid
, set blockHeight
and indexInBlock
). (TODO: needs discussion on how to handle transactions not found in our mempool recordings)indexInBlock
can then be used to get the txid.
(As an alternative it would be possible to include or later add a table consisting of blockHeight
indexInBlock
txid
for users with no database space restrains. But that not needed for an MPV I think.)I think the compression process can be triggered every time a new block arrives over the ZMQ topic rawblock
. With BIP34 (since Block 227,836) the blockHeight
is included in the the deserialized block's coinbase. See code example https://github.com/0xB10C/memo/blob/591a968802b63983fd399ebbb0488d0103173a33/memod/processor/zmq_handler.go#L54-L76
Your approach requires a few more bytes per transaction. However, that data is highly compressible and keeping the transaction table simplifies queries like firstSeen < t < confirmationTime
, so we should go with it.
The uncompression of the data (i.e. when txids are needed) could be done by accumulation all needed block hashes for the query and then querying those from Bitcoin Core. The indexInBlock can then be used to get the txid.
I wonder how often txids are actually needed. Use cases were only the feeRate distributions and confirmation times are needed (fee estimation) do not require the txid. Omitting txid by default from the Mempool
interface might allow significant speedups.
The naive way of storing tx metadata keyed by txid requires 32 bytes of overhead (txid) per transaction. Using 400k txs/per day as a high estimate (source), this requires about 12mb of noncompressible storage per day.
Since we require a bitcoind instance anyway, we can use a simple compression scheme to reduce the storage requirement. Once a block has enough confirmations (144), we can store the tx metadata for transactions in that block as a list
[m_1, m_2, ... m_n]
in the same order as the transactions in the block.This makes reconstructing the mempool a bit more tricky: instead of querying all txs with
firstSeen < t < confirmationTime
, we must resolve pruned block data backwards.Ideally we should store orphaned blocks around as well and consider reorgs in the mempool reconstruction.