Improved flow edge format with only one virtual node

chriseth commented 1 year ago

Currently, when creating the graph to compute the max flow on, we are creating two intermediate nodes per edge. This is partly done due to the fact that when starting with the Edge data structure (as opposed to the full Safe database), we are unable to properly compute the maximum tokens, an account can receive.

If we start from the Safe database and directly compute the flow graph, we should only add one intermediate node. This should solve problems with organization wallets and could also increase performance (because the graph is smaller). The graph would be created as follows:

There is one node for each account and one node for each token. Edges are always only between nodes and tokens.

Outgoing edges of account A: If A has k tokens of type b, there is an edge from A to b with capacity k. Incoming edges of account C: For each token b that C accepts, there is an edge from b to C with the amount of tokens it accepts. If C is an organization, the capacity is infinite / very large.

This should work since no account can spend more than its balance and no account can accept more than its trust capacity (per token).

What is a bit more difficult in this scenario is how we turn a max flow computation result into a list of tokens transfers, because we have some flexibility of choice: For each account and each token, we only know how many of those tokens that account sends, but we do not know for sure where. We can choose from all accounts that receive that token during the multi-transfer. We previously had problems with the order of transactions, that problem could occur again and be even larger.

This freedom of choice has an advantage as well, though: For each token we can find a minimal number of transfers to split this token into. For example, if A sends 2 a-tokens, B sends 2 a-tokens and both C and D receive 2 a-tokens, then this can be realized by four 1-token transactions or by two 2-token transactions.

chriseth commented 1 year ago

It looks like the edges can be directly computed from the SafeDB, so we do not even need to generate an intermediate data structure like the EdgeDB. In other words, what we would need to do is write an alternative implementation of Adjacencies that does not contain an EdgeDB but instead juts a reference to a SafeDB. Since we still need some multiplications to compute the capacities, it might make sense to cache it, but a first implementation can just directly query SafeDB on the fly.

louilinn commented 1 year ago

We talked about this last week and we (@chriseth @llunaCreixent and me) think this can reduce the complexity of the max flow calculations significantly by reducing the number of virtual nodes and edges involved. It will also make the representation of the capacity network more intuitive. Changing it makes sense now that we are no longer relaying on compatibility with the old edges data format.

This part is what will require some extra consideration for sure:

What is a bit more difficult in this scenario is how we turn a max flow computation result into a list of tokens transfers, because we have some flexibility of choice: For each account and each token, we only know how many of those tokens that account sends, but we do not know for sure where. We can choose from all accounts that receive that token during the multi-transfer. We previously had problems with the order of transactions, that problem could occur again and be even larger.

This major refactoring will likely take a couple of weeks to complete. Depending on our capacity me and @llunaCreixent will try to provide more context and structure to this task and an approach to the challenging part.

CirclesUBI / pathfinder2

Improved flow edge format with only one virtual node #49