Closed tkoskela closed 2 years ago
From Matthjis:
Looking at the test I think that might possibly be caused by a bad random seed. Still, I think it is kind of strange since, this should always converge to a solution for a simple problem like this, but if the fault cannot be reproduced it might have been a particularly bad operator or data vector.
Another possible candidate in https://github.com/astro-informatics/sopt/runs/5886037546?check_suite_focus=true
Possible fix: Make tolerance of the test lower than the tolerance for convergence
One reason for errors is probably that different metrics are used in convergence (l1 norm, normalized against one of the vectors) and the Eigen isApprox method (l2 norm, normalized against the min of the vectors).
The original error in this issue is different. It seems that the iterations either 1) has not converged or 2) is using a too large step size in comparison to epsilon
@jasonmcewen thinks we can just fix the random seed to get rid of these issues.
So just find a set of actions that did pass, get it's seed and set that as a the seed for future actions? Should we still change the tolerances?
List of tests that fail intermittently:
List of tests that fail intermittently:
- [ ] test_sdmm (this issue, on Tuomas's Mac, and on GH actions)
- [ ] test_power_method
- [ ] test_primal_dual and earlier
- [ ] test_proximal and earlier
Are there any other tests that are generating random numbers?
I would still first relax the tolerances and confirm that is a solution to our issue, so that at least we've correctly identified what's going on.
As a second step, let's fix the random seed for all the tests using random numbers. It's good practice to do anyways, it will make the tests easier to reproduce. I suppose we could test with a few different random seeds, but I'm not sure that is really worth it.
Strangely these all seem to have gone away in #281 and #286. I wonder if these are triggered by something else going wrong in the build. 🤷
Nevermind, got another one!
test_primal_dual error is in the 3rd decimal
On 41dae34 test #8 failed on my Mac on the first time I ran it. All following runs passed so I could not reproduce. Could this be caused by the random seed? @MatthijsMars @CosmoMatt
Originally posted by @tkoskela in https://github.com/astro-informatics/sopt/issues/265#issuecomment-1072323584