gnosis / dex-research

Collection of research papers written within Gnosis
90 stars 31 forks source link

Redundant off-chain order collection for cheaper order placements #25

Open josojo opened 5 years ago

josojo commented 5 years ago

Idea: Since, data availability is expensive, reducing data availability to the required minimum is essential. This approach ensures data availability for orders with a positive trading volume only, and has weaker data-availability assumptions for all other orders and hence reduces the costs for placing orders. Several order collectors are collecting off-chain N orders. These operators create an order bundle and get it signed off by a distributed set of watch-towers ensuring the data availability. Then, each order collector posts the hash representing the order bundle on chain. The pool of valid orders for the optimization problem is then defined as the union of all orders represented by the different order hashes. The solutions to the optimization problem will then only be based on a fraction of the original orders. The number of orders to be considered by the solution will be bounded to at most M orders and only these M orders will be published on-chain together with their trading volumes. This is enough to ensure that at any time the data availability is given to calculate the current state of the chain, ensuring the non-custodial feature of the exchange.

Actors: Anchor-contract: The anchor contract is orchestrating the whole auction by allowing certain operators to do certain things in a specified timeframe. Also, it keeps track of all balances by storing important information as the balance hash.

Order-collectors: Order collectors are collecting orders from users for a small fee and then make sure that the orders are represented on-chain for the next auction. They collect the orders and hash them. After they are signed from several watch-towers, they will be sent the order hashes into the chain

Watch-towers: Watch-towers are ensuring data availability by signing order-hashes from order-collectors. They can be seen as powerful, paid archive nodes.

Optimizer: The optimizers are scanning all orders represented by all order-collectors and their provided hashes. Then they try to solve the optimization problem, by collecting a fixed-size subset of orders and optimize the trader's utility ( surplus).

Protocol description:

The protocol is a batch-auction with uniform clearing prices. Each auction collects orders for 3 minutes and then finds fair prices by solving an optimization problem maximizing the trader's utility. The number of orders being considered in the solution of the optimization problem is bounded by a predefined number.

The batch-auctions are run on a roll-up/snapp side chain with full data availability for the actual balances in the system and guaranteed liveness. However, the orders of the system have a weaker data-availability ensured just by watch-towers.

Each auction the order collectors are collecting orders from end users for a small fee. The order collectors put concatenate the orders into a string, which is then getting hashed. Then, they send out the orders together with the hash to a list of watch-towers, in order to get their signature that the order hash is calculated correctly and that the watch-towers will store the order data. Then, once the order-collector has enough of these signatures, he will generate a snark to ensure that he has the signatures from the watch-towers. This snark will be sent into the blockchain together with the hash. The orders considered for the next batch will then be the union of all orders represented by the different hashes from the order collectors.

After the orders for a batch are defined on-chain via a set of hashes, and they are possibly decrypted, the optimizer will try to find the best solution for the traders utilization optimization. Once they have found a solution, they have to register the solution cheaply at the anchor contract by sending their found utilization. After another time-period (3 mins), the optimizer with the best solution has to prove that their solution is valid. Additionally to the validation that the actual solution is valid, the optimizer also has to provide the order data of the used subset of orders, their positive trading volume and a snark proof that the used orders are actually a subset of the orders represented by the different order hashes and that each order is valid.

OrderHashing

Let’s assume an order is represented by (‘accountId’, ‘buyToken’, ‘sellToken’, 'buyAmount', 'sellAmount') with a bit vector of the following size (24, 8, 8, 32, 32) = 104. Additionally, each order will have the fields: auctionIndex and signature, however, they do not need to be considered for the hash. Each order collector will take a fixed amount of orders and concatenate them. Then the resulting string will be hashed using the Pederson hash. If each order collector is allowed to collect N orders, he has to hash 104*N bits.

Orders included in the solution:

When the optimizer is required to submit all necessary data of their solution, such as trading volume and the utilized orders together with their signatures, they also need to provide a snark to prove that their orders were part of the original order hashes.

This snark has the following tasks: Public inputs: orderhashes of order-collectors Public output: orderhash of orders used in the solution.

Logic:

  1. Validate that the orders provided by the private input hash to the orderhashes
  2. Validate that the orders contain the orders used in the solution
    1. Validate that the orders used in the solution hash to orderhash

This snark logic is quite simple. The main contributor for circuits will be the pederson hashing (it requires 1.7 circuits per bit). Since the snarks need to calculate pederson hashes of data blobs, a snark with 228 circuits could theoretically hash 228/(1.7*104) = 1.5 million orders. Of course, as constraints are also used needed for validation step 2, and 3, probably, we can only consider 0.5 million orders, but still this is a magnitude more than current implementations.

Instead of snark proofs, we could also use smart contracts challenges to get the assurance that the orders were included. Though, in order to load 0.5m orders onchain, it would cost already over 442 mio gas.

Risks:

Watch-towers withhold data and only publish their order, if withholding order makes a good profit. Malicious watch-towers will be voted out by a DAO;

Watch-towers publish data only later in order to have an edge in computing the solution. Every solution calculator could operator one or several watch-towers.

Watch-towers might delay their signatures for order-hashes and thereby screw order-collectors. The outcry caused might make the watch-tower operator losing their position. Maybe not, if it happens once, but if it happens frequently.

Orders might not be available for anyone. Still a solution can be found on a subset of orders provided by other order collectors and/or watch-towers.

Order-collectors might censor orders: If we encrypt orders, no meaningful censoring is possible and users can always post their order to several order collectors


If some (dx)DAO members do liveness tests on watch-towers and monitor their performance, watch-towers are forced to behave well. Otherwise, they will be voted out by (dx)DAO.

Additionally, staking for order-collectors and watch-towers might incentivize them to behave honestly as a collective.

Challenges: Snark calculation for order-collectors might take quite some time and prevent them from submitting order hash in time BLS signatures from watch-towers are an alternative if needed

Overall assessment:

Of course, it is theoretically cleaner and incentives are better protected if the orders are collected on-chain. However, in order to reduce order costs, improve liquidity and make the whole system more attractive, such an offline order collection scheme would help a lot.

fleupold commented 5 years ago

It's an interesting proposal. I have a few specific questions below. Generally I feel this is making our trust assumptions more prone to attacks and I feel we would need to do a more thorough analysis of the implications (the point about watch-towers delaying/selectively withholding data is scary).

Questions

The number of orders to be considered by the solution will be bounded to at most M orders and only these M orders will be published on-chain together with their trading volumes

Why do those have to published again? Wouldn’t the watch tower approach guarantee data availability in the first place? It seems like we are not really confident in that mechanism if we require an extra publication on chain.

Also, how would we prove that M is a true subset of N (and that the solver didn't add some more orders to increase surplus)?

once the order-collector has enough of these signatures, he will generate a snark to ensure that he has the signatures from the watch-towers.

Isn’t it likely to be cheaper to send and validate the signatures on-chain? In what order of magnitude are you thinking?

josojo commented 5 years ago

Why do those have to published again? Wouldn’t the watch tower approach guarantee data availability in the first place?

Yes, I don't trust watch-towers completely. In the current construction, if the watch-towers do not do their job, nothing bad happens besides temporary liquidity reducing due to missing orders for the solvers. On the other hand, a data-unavailability problem, which makes the chain stuck would be much worse, that is why we want to have all the necessary data on chain for replicating the current state of the system.


Also, how would we prove that M is a true subset of N (and that the solver didn't add some more orders to increase surplus)?

a snark would have to do this. From the text above, I tried to explain it here:

This snark has the following tasks: Public inputs: orderhashes of order-collectors Public output: orderhash of orders used in the solution.

Logic:

  1. Validate that the orders provided by the private input hash to the orderhashes
  2. Validate that the orders contain the orders used in the solution
  3. Validate that the orders used in the solution hash to orderhash

Isn’t it likely to be cheaper to send and validate the signatures on-chain? In what order of magnitude are you thinking?

Yeah, probably it is! Or BLS signatures could also be cheaper... I was thinking about 10 to 30 watch-towers...

bh2smith commented 5 years ago

To summarize my understanding, are we essentially collecting orders in the way the roll-up is collecting transfer requests (i.e. off-chain)? Also, I would like to know more about this M and N. Am I to assume that M < N and if so, how is this M determined? Is there some way of selecting a subset of orders to choose for an auction?

This approach restricts data availability to orders with a positive trading volume only, and hence reduces the costs for placing orders.

Does this mean that M is the number of touched orders in an auction? Does M vary from one auction to another or is it fixed? If it is fixed, what determines when an order out of the N qualifies to be selected?

Then, each order collector posts the hash representing the order bundle on chain.

If we are only posting a hash of the order bundle what is the purpose of M?

The pool of valid orders for the optimization problem is then defined as the union of all orders represented by the different order hashes.

What are the different order hashes? So far I only see one (the hash of the order bundle).

fleupold commented 5 years ago

This proposal is very similar to the Data Availability Committee that StarkWare and 0x propose for their dex.

I'd be curious to hear from @twalth3r what the added complexity of this approach to the solver would be. Would it be possible to restrict the number of orders that can have a positive trading volume to M while still satisfying all constraints?

josojo commented 5 years ago

To summarize my understanding, are we essentially collecting orders in the way the roll-up is collecting transfer requests (i.e. off-chain)?

I was under the impression that roll-up is eventually also putting data online, they just have huge savings as they cut out the signatures..

Also, I would like to know more about this M and N. Am I to assume that M < N and if so, how is this M determined? Is there some way of selecting a subset of orders to choose for an auction?

Yes, think that way: We can currently settle 1000 orders comfortably. Now I want to have set M = 1000, but still make sure that we can consider N= 100 000 orders

Does this mean that M is the number of touched orders in an auction? Does M vary from one auction to another or is it fixed? That depends, if we are talking about snarks, we have to fix M to some number. If we would implement it with "smart-contract challenges" and no snarks, we can vary it, but would never let it grow over some number.

If we are only posting a hash of the order bundle what is the purpose of M? M orders will be settled later. (not sure how to answer it)

What are the different order hashes? So far I only see one (the hash of the order bundle).

Many order collectors could post different order hashes

fleupold commented 5 years ago

Another question: Would order cancellation work somehow in this model?

bh2smith commented 5 years ago

Many order collectors could post different order hashes

Interesting... right, so now this sub-collection of M orders is going to be possibly a subset from each collector.

josojo commented 5 years ago

This proposal is very similar to the Data Availability Committee that StarkWare and 0x propose for their dex.

I gave it a read, and yes basically they want to have it for all settled orders ( as in their concept the orders are offline anyway), while we want to have it only for the orders during the auction. And only the settled orders will then later be put onchain.

Another question: Would order cancellation work somehow in this model?

Yes, but this would make the inclusions proofs a little bit more complicated. One could, for example, add one bit per order, and this bit would indicate whether the order is a cancelation or a placement. During the inclusion proofs, we would require that the orders referenced in the solution do not match any canceled order. (Intuitively, this should work, but I am not confident to say that it works without nasty if branches)