ArdanaLabs / audit

0 stars 0 forks source link

[pile] rootfinding #4

Open quinn-dougherty opened 2 years ago

quinn-dougherty commented 2 years ago

Morgan:

the DanaSwap refactor to use hand written validators is going to introduce another potential vulnerability which we should analyze. With the state machine pattern, we are computing the solution to the invariant equation as part of the transition function. With validators, we are going to check that the solution provided as part of the transaction is correct. What does correct mean? I am defining it currently as: the next step of Newton's method creates a change in the solution which is smaller than some constant delta. If this condition is satisfied then the validator will accept the provided solution without further analysis. What is the attack vector? If an attacker can find a number which makes Newton's method create a change smaller than delta on the next step, but which is not actually a solution to the invariant equation. Newton's method is a convex optimization method which is not guaranteed to stop at a root; for example, it can also stop at a zero of the derivative. I think that the best strategy to mitigate this risk is probably going to be to prove properties of the invariant function which imply that Newton's method converges on a zero under all conditions allowed by the contract. For example, this would be true if the conditions for the proof of quadratic convergence of Newton's method are satisfied. https://en.wikipedia.org/wiki/Newton%27s_method#Proof_of_quadratic_convergence_for_Newton's_iterative_method

quinn-dougherty commented 2 years ago

Formal Proofs for Theoretical Properties of Newton’s Method

quinn-dougherty commented 2 years ago

@morganthomas These conditions entail that the solution converges with quadratic speed, are we interpreting that as meaning that it also converges "correctly" (correctly meaning not to the zero of the derivative)? Screenshot from 2021-09-20 10-15-15

Wikipedia alone did not back up "it can also stop at a zero of the derivative.", neither did a cursory googling. I can see how the algorithm crashes at f' x_n = 0, but not how it returns a dubious result.

Moreover-- Can you say some words about how you're thinking about selecting x0 to kick off the search? This seems important to the proofs.

Also: Can you confirm for me that we have upper and lower bounds to what we're expecting D to be? When we make the invariant equation a polynomial in D set to zero, the proof obligations would like for us to set the domain to an interval

Idea for an algorithm, because the form of the invariant polynomial gives us a direct solution to f' D = 0:

  1. use the outcome of (...) ^ (1 / n) to find the zero of the first derivative, D'0. I am not aware of haskell's behavior for even n and values of (...) such that the roots are all imaginary. For odd n at least one root is real, and if you consider multiplicities the number of real roots is odd, I am not aware of haskell's behavior under nonuniqueness. Moreover it is not clear that haskell's default behavior is what's desirable. (if it's all ieee754 I can research that) I think interpretation of nonuniqueness of roots should be a part of the spec.
  2. Check if f D'0 = 0 is true (i.e. (abs . f) D'0 < 1e-7 or whatever our tolerance is)
  3. if it is false, we are secure against the zero-of-first-derivative attack vector, and proceed with the solver.

Not sure what to do if it is true yet. There could be a case where coincidentally the zero off the derivative is the zero of the base function!

quinn-dougherty commented 2 years ago

Summary of discord chat and progress to date

The x0 that kicks off the search is the value that was found at the previous state. Selecting initial value "from scratch" only really has to happen once, and the simulations will inform that.

Our current implementation, which guards against division by zero, will return a dubious/unfinished result if f' x = 0.

Simulation work will provide upper and lower bounds for D, suggest values for A, and lead to other insights which will guide/multiply our efforts on this.

Please see the notebook for details.

Now that we've characterized the zeros behavior of the polynomials (and indeed decided to view the invariant function as polynomials), we can reimplement Invariant.hs to make this characterization more clearly coming from the code. We are also discussing a "prefix" to the solver, conditional logic which checks if the derivative will be zero (can also make claims involving the phrase "in a neighborhood about zero", as when that neighborhood's radius / 2 is less than the distance between the initial guess and the true root, the solver is not guaranteed to converge at all (see wikipedia)).

The goal of future work in this direction, if it gets prioritized, is to analyze the behavior of the solver to rule out inadequate or false behavior. As of this writing I have taken I think all steps that ought to reasonably be taken before simulation experiments are conducted.

quinn-dougherty commented 2 years ago

Idea: eigenvalues of companion matrix

A companion matrix C(p) of a polynomial p is a (rather sparse) square matrix; characteristic polynomial of which is equal to p.

In principle we can solve the invariant polynomial by sticking it's companion matrix into a blackbox eigensolver.

Morgan provided the following in discord:

Here are some things to consider in considering using any algorithm implementation (first party or third party) to solve the invariant equation.

  1. That algorithm will have limited precision and conditions for finding accurate solutions to our particular problems. Are those conditions met by the current invariant problem? Will they be met by future invariant problems?

LAPACK:

Here is a page about the precision of its eigenvalue solver

Margin of error is 3.1e-1

What about accuracy? Is the solver guaranteed to be accurate? What is the algorithm for the approximation? What are the assumptions of this algorithm? I don't know the answers to these questions.

Newton:

Conditions for precision and accuracy are not exactly known.

  1. What platforms support that algorithm implementation?

LAPACK: Haskell but not Plutus

Newton: Haskell and Plutus

  1. What platforms do we need the algorithm implementation to run on?

Just Haskell, just on our own machines

  1. Who supports the implementation and what kind of support is available to us?

LAPACK: actively maintained on GitHub with 72 contributors and 2,295 commits; issues on GitHub have about 50% rate of engagement; unknown what support is available

Haskell LAPACK: actively maintained by Dr. Henning Thielemann; unknown what level of support is available

Newton: it's all on us

  1. How has the implementation been tested?

LAPACK: has half a million lines of test code

Haskell LAPACK: has 7.5k lines of unit tests

Newton: it's all on us and the testing is currently not sufficient

  1. What is the performance like on expected problem sizes?

LAPACK: unknown

Newton: unknown, but sufficient, based on property tests

quinn-dougherty commented 2 years ago

I raised the question of arithmetic tree size in https://github.com/ArdanaLabs/audit/issues/23, the idea being that different ways of writing the invariant function may result in less total operations. However, iohk have not published nor pushed code on the cost semantics / gas model nitty gritty, so it's difficult to buy this value proposition until that happens.

Moreover, Nick is conducting an analysis to try to get the functions more well-behaved near the roots! This sounds promising. Will wait for results.

quinn-dougherty commented 2 years ago

A lot has been happening here: https://github.com/ArdanaLabs/audit/issues/26 https://github.com/ArdanaLabs/audit/issues/27 https://github.com/ArdanaLabs/audit/issues/28 https://github.com/ArdanaLabs/audit/issues/29 https://github.com/ArdanaLabs/audit/issues/30 https://github.com/ArdanaLabs/audit/tree/main/longreports/rootfinding/companion plus other stuff.