Closed blegat closed 4 years ago
Awesome! The diff looks reasonable to me, but it seems like the easiest way to test it is to just merge and see how it works (I guess we could setup something to build preview versions on PRs to have a better system in the future, but I don't think we get that much traffic yet that occasional breakage is probably OK). So I'll merge this and also make you a collaborator so you can merge any fixes or tweaks you want to do.
I guess we could setup something to build preview versions on PRs to have a better system in the future, but I don't think we get that much traffic yet that occasional breakage is probably OK
I agree, configuring that on PR might be overkill
I see also it's fairly robust in that if the test run doesn't go through, the docs don't get built, and so the actual webpage doesn't change. Btw, I just pushed a commit to try to fix the issue that caused it to fail; I think it's that adding ConvexTests to the Tulip manifest means that when we add TableTestSets, it looks for ConvexTests and doesn't find it, but dev
ing ConvexTests first should fix that.
dev
ing ConvexTests first
this was a bad idea since TableTestSets is in its dependencies, so if it gets dev
'd first, it doesn't find TableTestSets and errors. I resorted to just removing ConvexTests from the project in the Tulip folder, which is how things worked before. (It gets added in the CI run).
@blegat should we rename the repo? any suggestions for a new name?
Yes indeed, it was a mistake to add it, I didn't mean to push it like that.
It could be SolverBenchmarks
? Or call it SolverTests
(but then we need to merge https://github.com/blegat/SolverTests) into this one
Either way seems ok, although with SolverBenchmarks some may expect performance benchmarks which isn’t the focus here currently.
I think merging it with SolverTests makes sense; we could even just copy the Travis script over. I think we should still make sure to keep the two purposes separate: one is for changes in MOI causing package test failures, the other is an external testset for benchmarking correctness. In particular a package should always be expected to pass its own tests unless there has been a regression or breakage, but the solvers have not made the same commitment to pass these tests. (obviously, I’d prefer if all these tests pass of course, but I’ve seen myself with SDPA-GMP how tricky that can be).
This can get even more blurry when solvers adapt these tests within their own test suite like COSMO did with the ProblemDepot, but I think the purpose is still separate; that is for making sure they don’t regress on a test when making internal changes; likewise Convex uses the same tests to make sure we don’t break Convex, and the SolverTests script is for MOI compat.
In fact, in SumOfSquares, I also use this process to check if a solver was passing tests and now it does not anymore. In: https://github.com/jump-dev/SumOfSquares.jl/tree/master/test/Solvers I have all solvers and each file exclude the tests with are known to fail for each solver. This way, I can check whether a change in SumOfSquares triggers a regression. We could have a similar feature here, run the tests with two different version of a packages and look at the different of timing and failure, reporting the regressions and improvements. That would be used for instance when we want to tag a new version of MOI, Convex or SumOfSquares, we check master against the latest release and check that there is no regression either in failure or performance.
Good idea! Maybe TableTestSets could support serializing the results to disk, and then we set this repo up to be able to run the tests twice in two different environments and then diff the test results and show that? PkgBenchmarks does something similar for performance benchmarks I think
Also related, https://github.com/JuliaComputing/NewPkgEval.jl runs package tests on different versions of Julia and reports which packages regressed and the test logs
we set this repo up to be able to run the tests twice in two different environments and then diff the test results and show that?
Yes exactly, I think it's better to re-run the tests twice rather than comparing with the results written in docs
as we're not sure these are not outdated.
Adding more modules for testing can now be achieved by adding a module to
src
with the dictionaryPROBLEMS
and the functionsdummy_problem
,handle_problem_function
andforeach_problem
. The module then of course also need to be added toMODULES
.