Add further plots and their code to benchmarking`

d-krupke / cpsat-primer

The CP-SAT Primer: Using and Understanding Google OR-Tools' CP-SAT Solver

https://d-krupke.github.io/cpsat-primer/

Creative Commons Attribution 4.0 International

388 stars 35 forks source link

Add further plots and their code to benchmarking` #40

Closed d-krupke closed 3 months ago

d-krupke commented 4 months ago

There are more approaches and plots to benchmark a model.

In especially, it is actually quite easy to set up a good framework for evaluating the influence of different parameters on the performance. This could work quite model agnostic only based on the protobuf files. One could even build an automatic parameter tuner.

I would refrain from adding too much functionality here as the parameters can quickly change, but at least having some basic framework would be quite helpful and not too difficult. Maybe even integrate it with the log analyzer?

d-krupke commented 3 months ago

This kind of plots should be added as they are super useful if one want to compare multiple metrics between two different models/approaches. For example, you could see the trade-offs between quality and runtime between two different approaches. If dealing with a complex multi-objective (as is often the case for real problems), one can use them to explain the client trade-offs as unfortunately we can not always improve on all objectives but sometimes have to make sacrifices.

This kind of benchmarking is not yet discussed in the primer but I think it is very useful. I actually learned abut this kind from some video by nextmv in the context of "shadow models" that are run in parallel to test how the dev-version would have performed.

d-krupke commented 3 months ago

I wrote some code for this and prepared an example, but I am not yet sure where to put such a section. It does not really fit into the current Benchmarking section because that has a different story line. This is rather about iteratively improving a model.

d-krupke commented 3 months ago

Added a quick text here but I am not completely happy with that. I may have to rewrite the whole chapter. The chapter was written while I was still 100% scientist with artifical problems. Now I know how much more complex the real world is and that you usually just cannot do such a benchmark.

https://d-krupke.github.io/cpsat-primer/08_benchmarking.html#comparing-production-with-development-versions-on-multiple-metrics