cowprotocol / services

Off-chain services for CoW Protocol
https://cow.fi/
Other
186 stars 72 forks source link

Cut Auctions in Autopilot `run_loop` #1673

Closed nlordell closed 2 months ago

nlordell commented 1 year ago

Don't cut auctions in the background, cut them in run_loop instead. Related code, this is currently getting the auction from a background task, and should be called directly in run_loop instead.

The reason for the current system comes from some tech-debt that accumulated over time:

We started cutting auctions on-demand in the orderbook. At some point, we realised that preparing the auction ad-hoc was taking too long, so we moved it to a background thread. Then, we optimised this so that auction cutting times are quite consistent. Now, the autopilot drives the protocol, so:

  1. we don’t need the background thread to update the auction for serving over the API anymore
  2. it is fast enough that it can be done at the start of the auction instead of on a background thread

We should be careful about removing all background updates though, the out-of-sync background update is useful for priming the caches we use at auction building. Not saying we should keep the current architecture, just be mindful of this fact when we pay back the tech-debt.

MartinquaXD commented 10 months ago

@fleupold looking into this issue I think we actually don't want to move auction building onto the hot path. We have been fighting for every second we could reduce the latency in the loop and moving that onto the critical section would add back on average 2.5s&_a=(columns:!(log),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:c0e240e0-d9b3-11ed-b0e6-e361adffce0b,key:kubernetes.labels.app_kubernetes_io%2Fcomponent,negate:!t,params:(query:controller),type:phrase),query:(match_phrase:(kubernetes.labels.app_kubernetes_io%2Fcomponent:controller))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:c0e240e0-d9b3-11ed-b0e6-e361adffce0b,key:kubernetes.container_name,negate:!t,params:(query:mainnet-api-prod),type:phrase),query:(match_phrase:(kubernetes.container_name:mainnet-api-prod))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:c0e240e0-d9b3-11ed-b0e6-e361adffce0b,key:kubernetes.labels.network,negate:!f,params:(query:mainnet),type:phrase),query:(match_phrase:(kubernetes.labels.network:mainnet)))),grid:(columns:('@timestamp':(width:164))),index:c0e240e0-d9b3-11ed-b0e6-e361adffce0b,interval:auto,query:(language:kuery,query:'mainnet%20and%20log:%20%22updated%20solvable%20orders%22'),sort:!(!('@timestamp',desc)))) (sometimes up to 7!).

In my head the "optimal" strategy would be: 1: notice new block 2: SolvableOrdersCache waits for all block dependent components to update their state (unclear how) 3: auction gets prepared 4: some time later the autopilot accesses the auction 5: autopilot filters out orders that might have been cancelled off-chain in the mean time 6: auction starts

1-3 should happen in the background and 4-6 happen in the critical section of the loop. By introducing 5 we can make the entire auction building entirely block dependent which is necessary for optimal state with the least latency.

For these reasons I would NOT implement what this issue suggests and instead slightly rethink how we build auctions (while keeping it in the background). Do you agree with my assessment? If so do you still think this is issue has high priority at the moment?

fleupold commented 10 months ago

In my head the goal is still to steer towards one auction per block. Therefore, while the auction preparation doesn't have to count towards the time solvers get to participate (yet), I still think we should prepare the infrastructure to allow for this.

I'd see the flow something like

  1. When we see a new block
  2. Build auction
  3. Check if another auction is still in progress (in this case abort)
  4. Otherwise, send auction to solvers with 15s time frame

Once 2 is fast enough, we can reduce 4 to something like 7 seconds and only allow submission in the next block (if not included cancel). Et voila, one batch per block. I agree that it's not clear if this should be part of the colocation epic, or later become part of one batch per block.

MartinquaXD commented 10 months ago

This also sounds reasonable. Maybe I got too excited about the latency improvements we recently had so I lost track of the big picture.

MartinquaXD commented 10 months ago

After some more thought I believe this should be happening when we tackle 1 block per auction. Nothing in this issue makes colocation work any better or more reliable. In fact it until we actually push for 1 block per auction the performance of the protocol is actually degraded because we would introduce unnecessary wait periods (between submitting a solution and waiting for the new block).

github-actions[bot] commented 7 months ago

This issue has been marked as stale because it has been inactive a while. Please update this issue or it will be automatically closed.