Closed ejgallego closed 3 years ago
That and other things:
The benefits (handling of dependencies) still, even in this state, seem to outweight the drawbacks.
At the moment, we are evaluating the level of noise. By that I mean that for the chosen branch what I do is I check the results we get if we compile all packages once, or twice, or thrice, or four times. That will give us the idea how many iterations are enough (wrt. to the desired precision).
Anyway, the time to resolve the dependencies and to download the stuff is, fortunatelly for the macro-benchmarks, negligible wrt. the whole compilation time.
Also, the plan is to look at the implementation of the opam
command and add subcommands that we need. What I need (and I do not yet have) is the ability to tell opam
just to download the stuff and do nothing else.
opam
already provides one thing that is essential ... before we start benchmarking of the chosen package, we are able to install all its dependencies and thus exclude the compilation of those dependencies from the benchmarking.
The better counterargument against relying on opam
is, I think, the fact that:
The implementation, in that respect, isn't particularly appealing. I am not very happy about it.
Anyway, the time to resolve the dependencies and to download the stuff is, fortunatelly for the macro-benchmarks, negligible wrt. the whole compilation time.
That is not true in my experience. Also, in your iteration scheme, opam caching will quick in, distorting the results based on the size of download.
I agree it is not so clear if the gains outweight the drawbacks. Maybe we could try a hybrid approach, like using opam to install dependencies, but do the build to be benchmarked manually. This way, we would not need to create a fake package, we would have more control on what we measure, but could still rely on OPAM to solve dependencies.
We could certainly share the build recipes in dev/ci
, I agree that using opam for dependencies is OK.
Futhermore, we should ideally produce time reports by file. Most projects use coq_makefile so that should be doable.
Changes in time by .v
file.
That is a good idea.
In the future we will be able to track even changes by section, and sentence (sercomp will support that).
I would rather have the issue closed when it is actually solved (hint, other people interested can search, collaborate and comment) but I guess it is your call.
It seems to me indeed that a feasible approach would be to teach coq_makefile
to instrument developments for detailed bechmarking. cf coq/coq#406
In the end, we may want to use python, R, or some other higher-level language thou if we are going to produce so much data. bash is just not gonna cut it IMHO.
@ejgallego To decrease the confusion a little bit, do you think that you can reiterate points that led you to conclude:
Opam install not fully suitable for benchmarking.
Several things (related/unrelated) were mentioned and even the original focus shifted a little bit. So restatement might help us, I think.
The way how I originally planned to address this problem is to fix this issue at the root so that we can keep the benefits opam
provides.
That looks good, given the scope of the benchmarking it may make sense for now.
However, if/when we want to do fancier tricks such as calling coq_makefile --benchmark
we will either have to modify opam, or think some kind of trick, (maybe and env variable?)
So let's keep an eye on it.
we can keep the benefits opam provides.
I am not so convinced yet that opam provides many benefits for testing bleeding edge code. A good example is iris, they had to roll out their own package system as opam had not the proper granularity cf https://gitlab.mpi-sws.org/FP/iris-coq/issues/83
We should make sure that the "download time" does not influence benchmarking of "build time".
Be aware that when using opam there are other aspects beyond download time that may influence the timing, hence the original title was more accurate.
This is still an issue, and a serious one it seems
Note that if you call make TIMED=1 --output-sync
on a coq-makefile-made Makefile, the python scripts I have can process the logs of the resulting build and display per-file timing diffs as well as whole-project timing diffs. (We unfortunately don't get CPU cycles nor mem faults, etc.)
FWIW I'm also using a script to filter the HTML output of the bench, which sets the TIMED
variable and allows to retrieve a per-line diff. It's essentially the same as the one provided in the repo, except that it's quasi-linear in the number of lines. (The one from the bench is cubic IIRC.)
Unless I am missing something, using
opam install
is not suitable for benchmarking. Among other things, it will call constraint solvers, try to download packages, and thus could introduce significant differences on timing.@maximedenes, what do you think?