LagginTimes commented 1 year ago

However, we should consider changing the behavior of TxGraph::walk_conflicts to walk ancestors to a given depth and check conflicts on those ancestors. This can be a separate issue.

Originally posted by @evanlinjin in https://github.com/bitcoindevkit/bdk/pull/1064#pullrequestreview-1597040854

LLFourn commented 1 year ago

can we get a deeper exposition on why we need to do this. What problem are we solving and what others solutions have we considered?

evanlinjin commented 1 year ago

@LagginTimes's work on testing conflict resolution in TxGraph (#1064) found a bug.

The Bug

Given transactions:

A, B, B', C.

Where:

B and B' conflict by spending the same output of A.
C spends output of B.
B' has the more recent last_seen.

Expectations:

We expect TxGraph::list_chain_txs to return: A, B'.

Reality:

Before #1064, TxGraph::list_chain_txs would return: A, B' and C.

Why the bug exists

So why was C included when it shouldn't have been?

When try_get_chain_position determines that C is not anchored in the best chain, it checks whether it has conflicts. It uses walk_conflicts with C as the input/root tx.

walk_conflicts first finds "direct conflicts" of C before returning descendants of those direct conflicts. Direct conflicts are two transactions that spend the same prev output. In this case, there are no direct conflicts.

How #1064 fixes the bug

1064 does not modify the implementation of `walk_conflicts`. Instead, it modifies `try_get_chain_position` to maintain a stack of ancestor transactions (up to a certain depth limit) where we call `walk_conflicts` for each.

However, the behavior of walk_conflicts is untouched.

Why we want `walk_conflicts` to consider ancestors first

Because, in reality, A conflicts with B', we just can't detect it if we don't traverse ancestors first.

LLFourn commented 1 year ago

Thanks for the explanation. I think this approach to the problem misses the mark. It does modify the API of walk_conflicts to make it more complicated. It is not clear how to set depth_limit or why I would want to set it. This parameter is used in try_get_chain_position and set to an arbitrary value of 25. It seems difficult to explain this design at every level and furthermore it doesn't fix the bug if the transaction conflicts at a distance of more than 25 ancestors.

[EDIT] @evanlinjin has explained to me that 25 is some magic network number. I still think the points below still stand.

Let's state the problem again.

We have graph of transactions where we can query an oracle to determine whether it's in the chain or not. The answer can be "yes", "no", "I don't know". When a transaction is in the chain, we know that every directly conflicting transaction is not. We also know that every ancestor is in the chain and every transaction that conflicts with them is not. However, sometimes the answer will be "no" or "I don't know" for a set of conflicting transactions and we still want to choose one of the set to be canonical so we can report a transaction as being unconfirmed.

So what we are trying to do is to create conflict free sub-graph from our directed acyclic graph.

Why modifying walk conflicts is suboptimal

Walking ancestors of every single TX up to some arbitrary limit (25) is creating a conflict free subgraph for nodes not more than 25 nodes earlier in the graph. We are likely redoing work for every tx. This is inefficient and error prone due to the limit.

A simple algorithm sketch

Why not just create a canonical subgraph once for the whole graph rather than redoing it for every tx?

Start with a topological traversal of the graph and create the conflict free subgraph by starting at the root txouts (the txouts the root transactions spend from which are the transactions that do not spend the txouts of any other transactions in the graph). You can find these by taking any of the transactions you are interested in and and traversing ancestors until you have a tx that doesn't spend from any other in the graph. Start by adding the outpoints of the inputs from the root tx to the queue (and do this every time the queue is empty -- but don't start at transactions you've already visited).

At each outpoint, look for all spends from it and choose a canonical one (if it exists).
emit the canonical one, (i) add all of its output outpoints to the queue, (ii) mark all direct conflicts as non-canonical and (iii) add the canonical one to a visited (to avoid visiting it again).

To choose the canonical tx:

If one of them is in the chain according to the oracle then that's it.
If none of them are in the chain, then score each tx by the best last_seen of it and its descendants i.e. it inherits the last seen of the descendants. Note here that we could find out that one of the descendants is in the chain at this point which would also resolve the conflict but we can ignore this for now.
If still can't tiebreak choose one with best feerate (here you could also find the best feerate including children to account for CPFP but leave this out of scope for now).
If none of them have a last seen then none of them is canonical
If the best canonical candidate so far is marked non-canonical then don't emit it and carry on.
If the canonical tx conflicts with another tx do not emit it yet, add the edge from the current outpoint to the candidate canonical tx to a kind of "ignore" list. The ignore list keeps track of what conflicts can be ignored for future traversal. This allows you to defer the decision as to whether that tx is canonical until the adjacent conflict is considered.

I think something like this will work and be efficient enough for now. The key point to note is that we are not finding a maximal conflict free subgraph since we are dropping transactions that actually could be in valid in the mempool because they have a lower last_seen than another transaction even when this other transaction might not end up being canonical (because it conflicts with yet another transaction with a higher last_seen).

To see why this is justified imagine A B C where both A and C conflict with B and the last_seen values are C > B > A. If we visit the A, B conflict first we find that B is better but don't emit it since it also conflicts with C. Then we finally reach the B,C conflict outpoint later, and find that C is better so we emit C. Why shouldn't we emit A though? Its conflict is non-canonical so it could be canonical. We don't because B at some point replaced A in the mempool which means it's not there anymore. C has later replaced B but that doesn't magically bring back A. Note that if we visit the B, C conflict first, C will be emitted right away and then B will be marked as non-canonical. When we visit the A,B conflict, B will be the canonical candidate but won't be emitted because it's been marked non-canonical. So in both orders always emit just C.

Optimizing to only focus on transactions not in the chain

Rather than starting with the root txouts of the whole graph, first emit and filter out all the transactions that are in chain according to the oracle. Now treat the root txouts as those that are confirmed already e.g. we start with a subgraph of the unconfirmed transactions where the root unconfirmed transactions spend from confirmed txouts (or txouts that don't exist in the graph). The intuition is that our chain oracle can be trusted to enforce a conflict free subgraph for those confirmed transactions without considering conflicts manually. We then focus on doing a conflict free traversal of the unconfirmed transactions.

Applying to txouts and utoxs

To filter lists of txouts that are in the chain just start with the txout transactions as your arbitrary starting point. Any time you find a tx is canonical, the txouts of that transaction becomes canonical and can be emitted. UTXOs can be emitted whenever you would emit the txout with the added constraint that there are no canonical spends from it (which is determined when you later visit that outpoint in the main traversal).

bitcoindevkit / bdk

Refactor `TxGraph::walk_conflicts` to check for ancestors #1099

The Bug

Why the bug exists

How #1064 fixes the bug

1064 does not modify the implementation of `walk_conflicts`. Instead, it modifies `try_get_chain_position` to maintain a stack of ancestor transactions (up to a certain depth limit) where we call `walk_conflicts` for each.

Why we want `walk_conflicts` to consider ancestors first

Why modifying walk conflicts is suboptimal

A simple algorithm sketch

Optimizing to only focus on transactions not in the chain

Applying to txouts and utoxs

bitcoindevkit / bdk

Refactor `TxGraph::walk_conflicts` to check for ancestors #1099

The Bug

Why the bug exists

How #1064 fixes the bug

1064 does not modify the implementation of walk_conflicts. Instead, it modifies try_get_chain_position to maintain a stack of ancestor transactions (up to a certain depth limit) where we call walk_conflicts for each.

Why we want walk_conflicts to consider ancestors first

Why modifying walk conflicts is suboptimal

A simple algorithm sketch

Optimizing to only focus on transactions not in the chain

Applying to txouts and utoxs

1064 does not modify the implementation of `walk_conflicts`. Instead, it modifies `try_get_chain_position` to maintain a stack of ancestor transactions (up to a certain depth limit) where we call `walk_conflicts` for each.

Why we want `walk_conflicts` to consider ancestors first