ChrisCScott / forecaster

A personal finances forecasting tool for Canadian retirement planning
Other
1 stars 2 forks source link

Improve TransactionTraversal handling of linked/duplicate accounts #66

Closed ChrisCScott closed 5 years ago

ChrisCScott commented 5 years ago

The Problem The recently-implemented TransactionTraversal class allows for quasi-tree-structured input. It traverses that input as a tree to allocate transactions to the various Account leaf nodes. But there's a problem that arises in the context of (unordered) weighted nodes: If an Account shows up more than once as a leaf node (or, equivalently, if there are multiple LinkedLimitAccount objects which share an applicable transaction limit), then higher-order weightings might not be respected.

Example Here's a simple example:

      {}
(50%)/  \(50%)
    /    \
  R1      []
         /  \
        R2   T

If R1 and R2 are LinkedLimitAccount objects with a shared limit (or are the same Account) then in the present code contributing more than their joint limit will cause both R1 and R2 to be contributed to equally and for all excess transactions to go to T - throwing off the weighting of the accounts. It would be preferable for R1 to be filled in preference to R2 so that the 50-50 weighting is respected.

Rejected 'Solutions' We've considered using breadth-first traversal to respect higher-order weightings, but this won't solve the problem. The core problem is that the datastructure used by TransactionTraversal is not a true tree. Leaf nodes can have relations with other leaf nodes - often quite distant ones! This is a graph, not a tree.

Right now we deal with this by relaxing the requirement for preservation of higher-order weights and instead using a model of "marginal behaviour". That is, if we figure out how much can be contributed to the tree (with its current weights) until its behaviour changes - e.g. because a leaf is filled or a node hits its limit, thus causing weights to be rebalanced (and ordered nodes to shift to the next node). This results in broadly sensible output, in the sense that output is well-defined and all account-level requirements (e.g. transaction limits) are respected. However, the foregoing example shows that it could be improved.

Proposed Solution The likeliest solution is to treat this as a graph problem, not a tree problem. In particular, it can probably be solved by a network simplex algorithm (such as the capacity scaling network flow algorithm).

Here's how it might work. Each node is annotated with the leaf nodes under it (expanded to the groups of those nodes in the case of LinkedLimitAccount leaf nodes). After an ordinary tree traversal of a weighted node, we check to see whether weights have been respected. If not, and if any children share leaf nodes, we attempt graph traversal. We represent the node's children as nodes in a MultiDiGraph. Each node that has received transactions into a leaf node receives a directed edge to every other node sharing that leaf node. Those edges receive capacity equal to the sum of transactions to that leaf node. (Note that, if multiple groups are shared between a pair of nodes, multiple edges will be formed between them.) Each node receives demand equal to the amount that would need to be added to/removed from that node to achieve the target balance. (Consider whether demand can be further limited, e.g. to the maximum transaction value of all leaf nodes under the node, to avoid potentially unfeasible problems.) weight can be 0 for each edge.

Other Considerations Consider whether we need to add any nodes. For example, under the above, one node that shared a leaf node with two other nodes could send those transactions twice - one to each other node. We could add a dummy node for each shared leaf node such that each node with transactions to that leaf node has a directed edge to the dummy node (with capacity as noted above), the dummy node has edges to all accounts sharing the leaf node (with infinite capacity), and the dummy node has demand=0. This forces any flows received by the dummy to be redistributed, and all flows sent by a node to be sent only once.

There are some other outstanding questions. Chief among them: How to handle per-node limits, particularly for nested nodes at lower levels of the tree?

This method can fail (and raise NetworkXUnfeasible) if the target weights can't be achieved. We might be able to limit this by setting demand carefully (e.g. reweighting based on the nodes' actual capacity for transactions, not just the weights set by the parent node), but that may re-open the same can of worms we're trying to solve here. Consider what the fallback should be if there's a possibility that the method can't complete. Ideally we'd get as close as is practicable to the target weights; perhaps we can add a sink node whose edges have non-zero cost? (But what should the demand of such a node be?) Perhaps falling back to the current marginal-contribution approach is acceptable as well.

The foregoing assumes that we use a graph technique as a refinement of an initial tree traversal's output. This is reasonable, since it's not clear how we'd represent ordered nodes in the graph traversal. (It's easy to add directed edges between them, but how do you set demand?) Still, consider whether we could handle the whole allocation process as a graph-based network flow method - transactions could come in via a single source node (available) and the (unique) leaf nodes could be the sinks (with groups sharing transaction limits represented as a single node). In general, though, setting demand seems like it would be hard. Maybe you could prime the pump with an early tree traversal (to get demand for leaf nodes), but it's hard to see how you could determine the demand for parent nodes reliably.