Add SumOfSquares tests - Githubissues

blegat commented 4 years ago

Adding more modules for testing can now be achieved by adding a module to src with the dictionary PROBLEMS and the functions dummy_problem, handle_problem_function and foreach_problem. The module then of course also need to be added to MODULES.

ericphanson commented 4 years ago

Awesome! The diff looks reasonable to me, but it seems like the easiest way to test it is to just merge and see how it works (I guess we could setup something to build preview versions on PRs to have a better system in the future, but I don't think we get that much traffic yet that occasional breakage is probably OK). So I'll merge this and also make you a collaborator so you can merge any fixes or tweaks you want to do.

blegat commented 4 years ago

I guess we could setup something to build preview versions on PRs to have a better system in the future, but I don't think we get that much traffic yet that occasional breakage is probably OK

I agree, configuring that on PR might be overkill

ericphanson commented 4 years ago

I see also it's fairly robust in that if the test run doesn't go through, the docs don't get built, and so the actual webpage doesn't change. Btw, I just pushed a commit to try to fix the issue that caused it to fail; I think it's that adding ConvexTests to the Tulip manifest means that when we add TableTestSets, it looks for ConvexTests and doesn't find it, but deving ConvexTests first should fix that.

ericphanson commented 4 years ago

deving ConvexTests first

this was a bad idea since TableTestSets is in its dependencies, so if it gets dev'd first, it doesn't find TableTestSets and errors. I resorted to just removing ConvexTests from the project in the Tulip folder, which is how things worked before. (It gets added in the CI run).

ericphanson commented 4 years ago

@blegat should we rename the repo? any suggestions for a new name?

blegat commented 4 years ago

Yes indeed, it was a mistake to add it, I didn't mean to push it like that.

It could be SolverBenchmarks ? Or call it SolverTests (but then we need to merge https://github.com/blegat/SolverTests) into this one

ericphanson commented 4 years ago

Either way seems ok, although with SolverBenchmarks some may expect performance benchmarks which isn’t the focus here currently.

I think merging it with SolverTests makes sense; we could even just copy the Travis script over. I think we should still make sure to keep the two purposes separate: one is for changes in MOI causing package test failures, the other is an external testset for benchmarking correctness. In particular a package should always be expected to pass its own tests unless there has been a regression or breakage, but the solvers have not made the same commitment to pass these tests. (obviously, I’d prefer if all these tests pass of course, but I’ve seen myself with SDPA-GMP how tricky that can be).

This can get even more blurry when solvers adapt these tests within their own test suite like COSMO did with the ProblemDepot, but I think the purpose is still separate; that is for making sure they don’t regress on a test when making internal changes; likewise Convex uses the same tests to make sure we don’t break Convex, and the SolverTests script is for MOI compat.

blegat commented 4 years ago

In fact, in SumOfSquares, I also use this process to check if a solver was passing tests and now it does not anymore. In: https://github.com/jump-dev/SumOfSquares.jl/tree/master/test/Solvers I have all solvers and each file exclude the tests with are known to fail for each solver. This way, I can check whether a change in SumOfSquares triggers a regression. We could have a similar feature here, run the tests with two different version of a packages and look at the different of timing and failure, reporting the regressions and improvements. That would be used for instance when we want to tag a new version of MOI, Convex or SumOfSquares, we check master against the latest release and check that there is no regression either in failure or performance.

ericphanson commented 4 years ago

Good idea! Maybe TableTestSets could support serializing the results to disk, and then we set this repo up to be able to run the tests twice in two different environments and then diff the test results and show that? PkgBenchmarks does something similar for performance benchmarks I think

Also related, https://github.com/JuliaComputing/NewPkgEval.jl runs package tests on different versions of Julia and reports which packages regressed and the test logs

blegat commented 4 years ago

we set this repo up to be able to run the tests twice in two different environments and then diff the test results and show that?

Yes exactly, I think it's better to re-run the tests twice rather than comparing with the results written in docs as we're not sure these are not outdated.

ericphanson / ConvexTests.jl

Add SumOfSquares tests #2