NebulousLabs / Sia

Blockchain-based marketplace for file storage. Project has moved to GitLab: https://gitlab.com/NebulousLabs/Sia
https://sia.tech
MIT License
2.71k stars 440 forks source link

Proposal: Improve transactionpool consensus change efficiency #2187

Open marcinja opened 7 years ago

marcinja commented 7 years ago

Proposal: Improve the efficiency of the transaction pool's ProcessConsensusChange method

Motivation

Currently, on every consensus change (i.e. every block update), the transaction pool is purged and every transaction set held in memory before the change is re-added to the pool with a call to acceptTransactionSet. This is a large bottleneck for the transaction pool as can be seen from profiling results posted in a comment under PR #2141. With these changes, the transaction pool for Sia can be safely increased to a larger size.

Solution Overview

The ConsensusChange contains diffs for every type of transaction output. These can be used to make changes in the transaction pool. Each output diff specifies a direction, an object, and its ID.

For all diffs that show an object leaving the consensus set (as indicated by the direction of the diff) we should attempt to place it back in to the transaction pool.

For all diffs that show an object entering the consensus set, we should remove it and transaction sets which conflict with it from the transaction pool.

Removal from transaction pool

func conflictingSets(id ObjectID) []TransactionSetID

This function should take in an ObjectID (e.g. SiacoinOutputID) and return a slice of TransactionSetIDs which refer to transaction sets that also use contain that object (a double-spend), and transaction sets which spend outputs from any of those sets. Currently transactions are all merged with their parents into one set but any changes made here should be compatible with future transaction pool upgrades like adding RBF and cP4P.

This function can be used within ProcessConsensusChange to create a set of TransactionSetIDs which conflict with objects recently added to the consensus set. Specifically, they should be placed into a map[TransactionSetID][]ObjectID which stores a slice of each ObjectID that that set conflicts with.

Before totally removing any sets from the transaction pool, we should first check the composition of the set. It is possible that the transaction graph of the set can be split into non-conflicting and conflicting sets, in which case only the conflicting sets should be removed. The result can be many output graphs depending on the number of objects with which that set conflicts. Non-conflicting sets created this way should be put into the transaction pool, and added to the subscriber diff (after having pruned transactions older than maxTxnAge).

In the case that it is possible to split a transaction set into conflicting and non-conflicting sets, we should remove the original combined set, and add the non-conflicting sub sets. We should also keep track of all sets that were split this way or were pruned because of maxTxnAge in a map: map[TransactionSetID]*ConflictSet with ConflictSets defined as:

type ConflictSet struct {
    conflictSubsets       [][]types.Transaction
    nonconflictingSubsets [][]types.Transaction
}

After logic for adding sets back to the transaction pool has been done, all of these sets can be removed from the transaction pool's state. They must then also be put into a TransactionPoolDiff and used to update tpool subscribers.

Adding back into the transaction pool

Now each transaction from a recently reverted block should be placed into the transaction pool if possible.

It should satisfy the following conditions:

Other transactions in the pool

In most consensus set updates, transactions in the transaction pool will neither conflict with reverted transactions nor applied transactions. These should only be checked for transaction age and pruned if necessary.

DavidVorick commented 7 years ago

The existing setup, while slow, has been pretty battle-hardened. That is, we are pretty certain of the correctness of the way that we currently do things.

Can you extend your proposal to include a few methods of verifying correctness? Correctness would be defined as:

  1. transactions are only tossed if they have become invalid
  2. no invalid transactions remain in the tpool following a block update
  3. transactions which are reverted from a block are added back into the transaction pool
  4. for the cp4p calculations, a transaction sets average fees are updated correctly (iirc this is actually only tracked in the miner, who shouldn't have too much trouble)

One pattern that I don't think our test suite covers very well (even within the API) is the situation where you have one set of transactions in the tpool, with complex dependencies, and then a miner mines double spends. I'm especially wondering about situations where the child of a set is double-spend, but the parent is still valid, and the parent also has other valid children.

I'm still thinking over the proposal as a whole but I would like to see some plan for verifying correctness.