diprism / fggs

Factor Graph Grammars in Python
MIT License
12 stars 3 forks source link

Prove convergence of Broyden's method #47

Open davidweichiang opened 3 years ago

davidweichiang commented 3 years ago

I got this once from CI:

======================================================================
ERROR: test_broyden_2 (test.test_sum_product.TestSumProduct)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/fgg-implementation/fgg-implementation/test/test_sum_product.py", line 43, in test_broyden_2
    self.assertAlmostEqual(A.item(), B, places=2)
  File "/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/unittest/case.py", line 906, in assertAlmostEqual
    raise self.failureException(msg)
AssertionError: nan != 0.2857829828062876 within 2 places (nan difference)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/work/fgg-implementation/fgg-implementation/test/test_sum_product.py", line 49, in test_broyden_2
    for A, B in zip(sum_product(self.fgg_2, method='broyden', perturbation=perturbation), exact_value(p)):
  File "/home/runner/work/fgg-implementation/fgg-implementation/fggs/sum_product.py", line 76, in sum_product
    broyden(lambda psi_X: F(psi_X) - psi_X, J, psi_X)
  File "/home/runner/work/fgg-implementation/fgg-implementation/fggs/sum_product.py", line 27, in broyden
    psi_X1[...] = psi_X0 + torch.linalg.solve(J, -F_X0)
RuntimeError: linalg_solve: U(1,1) is zero, singular U.
kennethsible commented 3 years ago

In the pull request for Broyden's method, I described a potential instability that could occur in the unit testing. If the initial guess for the Jacobian (an identity matrix multiplied by negative one) isn't sufficiently close to the actual Jacobian, then the matrix could become singular (see the error message). To resolve that, I randomly perturb the Jacobian, as suggested by that textbook I cited. However, if those perturbations don't stabilize the convergence, then Python will continue with throwing the original AssertionError. So seeing that error message isn't an issue since that occurred because there are only a finite number (50, I think) of random perturbations. You should just re-run the unit testing and everything should pass.

kennethsible commented 3 years ago

If you want, I could look for some correlation between a random p-value and the initial Jacobian to prevent the unit testing from failing (at random). Obviously, we don't want the unit testing giving a false negative. In general, we should perturb the Jacobian to force convergence, but we could avoid doing that in the unit testing by selecting an appropriate initial Jacobian approximation given a random p-value, if I can identify one.

davidweichiang commented 3 years ago

Is there hope that the optimizations discussed in #48 will make this issue go away?

davidweichiang commented 3 years ago

To make unit testing deterministic, you could set the random seed to a value that is known to work. However, it would be much better if we could guarantee convergence when the solution exists.

kennethsible commented 3 years ago

Doubtful #48 would resolve the problem since the optimizations didn't change the initial guess. However, I pushed a commit yesterday that makes the chance of the algorithm not converging extremely unlikely. I'll push another temporary commit with your random seed idea, but I do intend to anticipate the Jacobian perturbation for each p value. I need to plot p value against perturbation and look for a pattern (just haven't found the time yet).

davidweichiang commented 3 years ago

Yesterday @kennethsible mentioned that an additional problem is that Broyden's method sometimes converges to the wrong fixed point. If I understand correctly, this is a related problem, right?

ISTM to me that because the Jacobian is approximated, the guarantees proven by Etessami & Younger and others might not apply. They prove that if you start at the zero vector, then the Newton iterates are always strictly less than the least fixed point. But in Broyden's method, we might overshoot, right?

kennethsible commented 3 years ago

Nederhof and Satta made a remark about oscillation and non-termination:

Values in s^{(k)} tend to contain both negative and positive values, letting the respective x^{(k)} move back and forth close to the root for many iterations. It is therefore more difficult to avoid oscillation and non-termination than in the case of the fixed-point method and Newton’s method.

davidweichiang commented 3 years ago

Ah, now I understand that remark better.

davidweichiang commented 3 years ago

Does this mean that we have to resign ourselves to Broyden's method sometimes failing? Or is there maybe a way to guarantee that we always undershoot instead of overshoot?

kennethsible commented 3 years ago

I am reading through Iterative Methods for Linear and Nonlinear Equations by C. T. Kelley. Why would Nederhof and Satta mention Broyden's method if we couldn't guarantee convergence to the least fixed-point?

davidweichiang commented 3 years ago

Possibly relevant: I just noticed that in the Etessami & Younger paper, their example of a system of equations that converges very slowly with fixed-point iteration is the same as example12p.json with p = 0.5. This is the same case that is the most problematic for Broyden's method, right?

kennethsible commented 3 years ago

p = 0.5 actually converges, but very slowly. The problematic values are those in the neighborhood of p = 0.5

davidweichiang commented 3 years ago

For one variable (the only kind of calculus I understand) because the 2nd derivative is positive, the secant method is guaranteed to always undershoot provided we start with x_{-1} < 0 and x_0 = 0. Maybe there's hope for generalizing E&Y's result to Broyden's method?

kennethsible commented 3 years ago

If I change the Frobenius norm in the derivation of Broyden's method from minimizing the difference between J_n and J_{n - 1} to the difference between J^{-1}_n and J^{-1}_{n - 1}, which would change v in the algorithm from -J^{-1}ΔF to ΔF, then Broyden's method with an initial Jacobian of -λI for λ > 0 converges to the LFP of the system (according to my testing). However, I'll still need to prove that result.

davidweichiang commented 3 years ago

Cool....that's the "bad" Broyden's method, right?

kennethsible commented 3 years ago

Yes, that's correct. I thought that I would try that method since we switched to updating the inverse Jacobian and the "bad" Broyden's method minimizes successive approximations of the inverse Jacobian. I still need to determine why that method doesn't overshoot the LFP. Regarding v = ΔF, I got that value from A faster Broyden method

kennethsible commented 3 years ago

I pushed a temporary commit (cfd0596) using the "bad" (not a great name) Broyden's method with a random λ, but we'll need to deterministically choose that value at some point. I assume a convergence proof will help with the selection.

davidweichiang commented 2 years ago

I kind of forgot what the issue was here...can this be closed?

kennethsible commented 2 years ago

TLDR; Broyden's method may not converge to the least fixed-point (LFP) and may even oscillate/diverge (see Nederhof and Satta). I found that the version of Brodyen's method described in A faster Broyden method might be sufficient to guarantee convergence to the LFP, but we still need to prove that result. Therefore, I would say that we leave the issue open.