chore: Towards simple score computations

fhenneke commented 4 months ago

Background

When implementing the circuit breaker it has proven difficult to exactly reproduce how scores are computed in the driver. The reason is inconsistent handling of rounding. I am not sure if I pieced things together correctly (mostly reverse engineering) but here is what I expected vs what is implemented.

Expected

All traded amounts are integers
Surplus are (rounded to) integers in traded tokens
Price improvement and volume based fees are (rounded to) integers in traded tokens
Conversion to ETH happens using external prices individually for differents trades using exact arithmetic + rounding per and is then aggregated

Implementation

price improvement, surplus, volume based fees are not rounded to integers before conversion to ETH (arguably increases pecision, but pretends that fractional amounts would have been traded without fees or for partially fillable orders)
conversion to ETH rounds multiple times instead of once at the end, leading to loss of precision
quotes are computed using float precision and not rounded to the closest integer, leading to loss of precision
database stores fee constants as floats and not as decimals (e.g. 0.01 != Fraction(1, 100) in python), leading to loss of precission

Acceptance criteria

Since external drivers need to implement scores and the should be no room for gaming that computation (beyond a few wei) we need to align on how to compute scores. One proposal by @harisang would be to change how we communicate quotes to solvers as at the moment we send quote amounts and fees while we could as well just tell the the one amount they need use to compute how much price improvement fees they need to pay.

I would argue that there should be a general simplification for computing scores.

sunce86 commented 4 months ago

I was afraid of this :)

My initial idea was for a score computation to be based on a struct (calldata basically) so that driver/autopilot/external parties all use exactly the same math, so potential differences can come only from different handling of types in different programming languages.

How serious is this problem now? Can you express the score differences in %?

harisang commented 4 months ago

So you need more than calldata in order to compute the score as protocol fees is offchain information.

I will attempt to make a concrete proposal here, that just suggests that every part of the total score computation is an integer (surplus in the surplus token, protocol fees in the surplus token, and finally multiplication of those with native prices).

Quotes Quotes are currently sent as follows to the driver:

"quote": {
     "sellAmount": "60000000000",
      "buyAmount": "60000348546",
      "fee": "1951969"
}

This already creates issues. It would be better to decide on whatever conversion we want to do (which ideally should be the same as the one used to generate the limit price when the user signs the order, say, with zero slilppage), and then present the quote as :

sellAmount
buyAmount

and that's it. And for a sell order, that would mean that the buyAmount is the threshold above which we collect a price improvement fee.

Protocol fees Here, we can always assume that whatever percentages appear are meant to be interpreted as X / 1000... And then, whenever we need to multiple such a percent with an integer quantity, we simply cut off the decimal part of the result, so that when computing the protocol fees in the surplus token, we end up with an integer.

The above would allow us to have everything related to score computations, on a per order basis, being an integer, and then multiplying these integers with the native prices, which are integers themselves, should cause no issues.

m-lord-renkse commented 3 months ago

@sunce86 why don't we use rust_decimal to provide exact calculation? see https://docs.rs/rust_decimal/latest/rust_decimal/

We can definitely make it super precise (very like 100 % precise) with that crate which offers up to 28 significant digits with the scale being also 28 significant digits.

We calculate our fees and score using rust floating point which has rounding errors. I am totally for adapting rust_decimals to reduce any imprecision fully.

fhenneke commented 3 months ago

There have been two false positives in the circuit breaker due to rounding of scores for this and that settlement. Both settlements involve a token with 0 decimals which amplifies rounding issues.

From the logs it looks as if the driver is not consistent in computing protocol fees. Below are some details.

I think the driver is generally rounding down to the next integer whenever integer amounts are required (e.g. for traded amounts). Would that be a reasonable assumption to base the circuit breaker on?

Example for hash 0xb2fbf5cf39cd24d5c8b38b2ff4326a2d11800eb2d1c7647c07c4547a4dca55a3:

The amount the user received is 29378
The limit buy amount is 27118
Surplus thus is 2260 (= 29378 - 27118)
The order is a limit order with lots of surplus. The protocol fee is thus capped via volume. Before rounding the amount is 29378 * 0.01 / 0.99 = 296.74... Rounding to the nearest integer gives 297, rounding down gives 296.
The price of the surplus token in the auction is 73529411764705878880955175796736.
Combining surplus and rounded protocol fees with the price gives scores of 188014705882352932 and 187941176470588226, respectively, (rounding down the final result).
The driver reported 187941176470588226, which is equal to the result for rounding down always.
Also noteworthy: From execution and protocol fees looks as if the solver had intended to give 29674 (= 29378 + 296) tokens to the user. The logs suggest it was actually 29673. So there is additional weirdness in the solution processing in the driver. But that need not be the concern here.

For me personally, it would be nice to use exact (rational) arithmetic and round to the nearest integer always. But I do not know how easy it is to implement that in rust.

sunce86 commented 3 months ago

Yes, I believe we have the inconsistency at at least 1 place in the driver with the ceil_div and we should fix that. However I am strongly against being too strict on score (asking for ==) as we are moving to solvers running their own driver that can be written in whatever language which can be limited with regards to precision.

I'm more in a favor of either:

Letting autopilot calculate the score and not the driver (but still we might have inconsistencies between autopilot and circuit breaker written in python (or something else).
Define tolerance in percentage, something similar we did in driver tests: https://github.com/cowprotocol/services/blob/main/crates/driver/src/tests/cases/mod.rs#L124-L133

One might argue precision can be achieved (blockchain clients are written in different languages and end up with same outputs otherwise consensys might fail), but I think we don't have to be that strict and solvers should focus more on other tasks in their roadmap other than chasing rounding error on U256 (just my 2c).

fhenneke commented 3 months ago

My favored approach is to let the autopilot handle scores and circuit breaking. In the near future, both will probably not be done.

The second best option, in my opinion, is to define a simple way of computing scores which is consistent (e.g. regarding rounding) and precision (e.g. using rationals and integers only). This would be the ground truth for the score. The reference driver and all other drivers then need to conform to that definition of score.

The circuit breaker can then check that with a tolerance (but probably a smaller tolerance than the $0.3 from the two false positives yesterday, which was also non-negligible in relative terms, 0.04%, and exceeding the current absolute tolerance of 10^12 wei). And maybe only blacklist in cases where reported scores are tool large.

The current differences in scores are due to rounding but are not small in absolute terms. Having too much room for drivers to implement scores to their liking would result in profitable gaming of score reporting. The false positives from yesterday had a difference in score of about $0.3 (0.04%), a lot larger than current absolute tolerance of 10^12 wei in the circuit breaker. Those numbers are well within the range of what currently determines winners of the competition.

fleupold commented 3 months ago

Does the circuit breaker check for equality or for the score being at least what was announced in the competition? If the latter, can't we make sure we always round the score computation down in the driver (to be on the safe side)?

fhenneke commented 3 months ago

At the moment, the circuit breaker checks for approximate equality (with a tolerance of 10^12 wei).

It sounds reasonable to change that behavior and only blacklist if the circuit breaker computes a smaller score than what was the driver reported. Then a driver could stay on the safe side.

With the current implementation of the circuit breaker (sticking with the same rounding as the driver for quotes, round to nearest integer for traded amounts) there seem to be deviations in both directions. I.e. sometimes the driver reports a smaller score and sometimes it reports a larger score than what the circuit breaker computes.

I played around a bit with different rounding again and it seems there is some places where values are rounded up (e.g. for surplus (in surplus token), quote buy amounts) and some places where values are rounded down (e.g. fees computed from fee factors, final results in wei).

One structural problem at the moment is that we need to avoid false positives as soon as possible. The circuit breaker at the moment is much easier to change than the reference driver, so the circuit breaker is adapted. Then the circuit breaker starts to follow idiosyncrasies of the reference driver and all other drivers follow it as well. And we get stuck with a somewhat complicated ground truth for scores. Maybe this one-sided test with "rounding up" in the circuit breaker is a way out of that.

fleupold commented 3 months ago

@sunce86 how hard would it be to change the reference driver to always be on the conservative side in terms of score reporting (ie always round down)?

sunce86 commented 3 months ago

🤷‍♂️

Hard to estimate, as I am not sure whether the ceil_div is the only reason for inconsistencies. I would prefer to fix ceil_div (use it for traded amounts everywhere or nowhere, probably former) and then go through the problematic examples and see the effect.

fhenneke commented 3 months ago

There might be one issue with rounding of surplus for partially fillable orders (probably also applies to protocol fees based on surplus/price improvement) we should fix. This might be one of the places where the driver/autopilot are rounding scores up.

Lets look at the trade in transaction 0x688508eb59bd20dc8c0d7c0c0b01200865822c889f0fcef10113e28202783243 for the partially fillable sell order 0xc6a81144bc822569a0752c7a537fa9cbbf6344cb187ce0ff15a534b571e277eaf87da2093abee9b13a6f89671e4c3a3f80b427676799c219

The limit sell amount is 470000000000 and the limit buy amount is 46976511.
In the transaction, the sell amount was 110473156723 and the buy amount was 11041916.
Surplus without rounding would be 121.40... = 11041916 - 46976511 * 110473156723 / 470000000000.
The price of the surplus token is around 3177764302250.
The surplus computed for the order (as written into the settlement_obsevations table) is 385780567811788, which is consistent with 122 * price. So a surplus of 122 was used, i.e. rounding up.
The circuit breaker computed a surplus of 384509480572312.
The scores suffer from the same issue. The driver reports 775374489749126 which is consistent with a surplus + protocol_fee = 2 122. The circuit breaker reports 769018961144626, which is consistent with surplus + protocol_fee = 2 121.

The approach we want to go for in the circuit breaker is to use a surplus of 121. This is because the smallest possible allowed executed amount (as per the smart contract) is ceil(46976511 110473156723 / 470000000000). Any smaller number would have violated the limit price constaint. Using that as reference to compute surplus (i.e. how much more a user got compared to the minimal possible amount) gives 11041916 - ceil(46976511 110473156723 / 470000000000) = 121. (I am on purpose not taking the position of "what did the solver want and how did the driver reduce surplus". With respect to rounding, In general, it will not be possible to exactly recover what the solver submitted by just looking at on-chain data. In this case logs seem to suggest that the inteded protocol fee was 120, another mismatch with what was computed afterwards.)

For buy orders it is the other way around as one has to round limit sell amounts down, the surplus would be floor(limit_sell_amount * buy_amount / limit_buy_amount) - sell_amount.

fleupold commented 3 months ago

@sunce86 can you look into fixing this please?

github-actions[bot] commented 1 month ago

This issue has been marked as stale because it has been inactive a while. Please update this issue or it will be automatically closed.

cowprotocol / services