chore: Orders excluded from auction due to missing price

fleupold commented 1 month ago

Background

A lot of "Time to happy moo" outliers spend at least part of their lifetime not being part of the auction due to "missing prices" in the autopilot.

Details

Other than very rare cases, we should not have any order entering the system for which we cannot compute the native price of its buy and sell token immediately.

Thus we should almost never see orders excluded from the auction for that reason. This task captures identifying the root causes of missing prices and reducing the amount of orders/time spent outside of the auction (in which case an order cannot be settled)

Example: https://explorer.cow.fi/orders/0x8986041be0bbe814c5ba965491a6540de59d65b0e8df81f2bb44d293d3131c144e345ebeff96471ace333bc3e04ab15c2ac02bd8669533d2?tab=overview

Autopilot logs: https://production-6de61f.kb.eu-central-1.aws.cloud.es.io/app/r/s/NscXq

Acceptance criteria

Measure the amount of orders that are excluded due to missing prices
Significantly improve that metric through root cause analysis and fix

m-lord-renkse commented 1 month ago

@fleupold the numbers are quite daunting (this graph represents the missing_price error occurrences, which means number of orders that have been excluded due to missing prices):

We have peaks of even +2k errors within two minutes span.

fleupold commented 1 month ago

I do think we should also find a way to consolidate the native price caches or in general revisit the logic with which the autopilot fetches prices.

Usually, native prices for both the sell and the buy token of a market order should already be cached (as they were fetched by the api when the quote was computed & the order placement was validated). However, the autopilot doesn't yet have the native price for this token cached and will thus not include the order in the first auction (which by itself can delay settlement by 50s and make the quoted price go stale). Ensuring that every order is included in the first auction after it is placed will bring significant "time to happy moo" improvement imo.

m-lord-renkse commented 1 month ago

I do think we should also find a way to consolidate the native price caches or in general revisit the logic with which the autopilot fetches prices.

Usually, native prices for both the sell and the buy token of a market order should already be cached (as they were fetched by the api when the quote was computed & the order placement was validated). However, the autopilot doesn't yet have the native price for this token cached and will thus not include the order in the first auction (which by itself can delay settlement by 50s and make the quoted price go stale). Ensuring that every order is included in the first auction after it is placed will bring significant "time to happy moo" improvement imo.

@fleupold doesn't the orderbook and autopilot use different cache? they both implement the same piece of code (since it is in the share create), but ultimately, they are run within their own instances, having the same cache in two different places and not synced. I may be missing something, but it seems like that. We even have two different grafana boards for each cache :thinking:

By having the prices in an independent service we will drastically improve the speed, according to what you said. This is really exciting.

fleupold commented 1 month ago

That is correct (and the root cause we should address in this issue beyond being able to simply price more tokens).

Under the hood, most requests from both api and autopilot go via our shared egress or bff, which itself caches so in practice most queries should be very fast to return (assuming that api has "warmed" the cache).

I'm not sure what the best approach is, but I think simply ignoring all tokens for which we don't have a record in the rust binary's cache by one auction is not good. Maybe we can have a "fast timeout" (100ms or similar) request which will be able to read from the later caches.

cowprotocol / services