MetaCoq / metacoq

Metaprogramming, verified meta-theory and implementation of Coq in Coq
https://metacoq.github.io
MIT License
373 stars 80 forks source link

ci-local target is not idempotent #563

Closed JasonGross closed 2 years ago

JasonGross commented 3 years ago

Coq's CI uses the following script:

./configure.sh local
make .merlin
make ci-local
make install

Running this script twice results in building the test-suite twice, see also https://github.com/coq/coq/issues/14448. This extra ~40--60 minutes of overhead makes it very hard to debug problems such as https://github.com/coq/coq/issues/14453 and https://coq.zulipchat.com/#narrow/stream/237658-MetaCoq/topic/No.20rule.20to.20make.20target.20tm_util.2Ecmx/near/241414773

herbelin commented 3 years ago

Hum, looks like @SkySkimmer might be the guru here.

SkySkimmer commented 3 years ago

Why me?

herbelin commented 3 years ago

@skyskimmer: I read too quickly and thought that this issue was related to the coq test-suite and dune, about which I have the impression that you understood all the tiny details. Sorry for the confusion.

yforster commented 3 years ago

I'm not sure I understand what's going on, but I have a lead.

First: I tried to reproduce on top of the coq-8.13 branch, but can't. Running your commands twice does not result in re-building the test-suite for me. Also the test-suite takes ~2 minutes for me, so it wouldn't explain a 40-60 minutes overhead.

But: The build is indeed not idempotent in the sense that calling configure.sh results in a call to make -f Makefile mrproper. mrproper deletes every Makefile in the sub-projects. These Makefiles are then re-generated. Re-calling make ci-local with the new Makefiles does not result in a run of coqc or ocamlc, so it is idempotent in a sense. But maybe this confuses the ci buildsystem, since there are now new files (apart from the timestamp identical to before the second run)?

JasonGross commented 3 years ago

The build log of the second run is at https://gitlab.com/coq/coq/-/jobs/1316144597 The first thing it does is reinstall equations, because the installed files are not kept in artifacts (only the build results).

As you say, the log contains the deletion of the Makefiles starting at https://gitlab.com/coq/coq/-/jobs/1316144597#L1721

Strangely, the call to make -C pcuic .merlin at https://gitlab.com/coq/coq/-/jobs/1316144597#L1774 results in

COQDEP VFILES
File "src/classes0.mli", line 1: No such file or directory
OCAMLLIBDEP src/metacoq_pcuic_plugin.mlpack
make[3]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
COQDEP VFILES
File "src/classes0.mli", line 1: No such file or directory

More worryingly, the call to make -f Makefile.plugin .merlin at https://gitlab.com/coq/coq/-/jobs/1316144597#L1825 results in

make[3]: Entering directory '/builds/coq/coq/_build_ci/metacoq/checker'
make[3]: warning: jobserver unavailable: using -j1.  Add '+' to parent make rule.
COQDEP VFILES
File "src/all_Forall.mli", line 1: No such file or directory
COQPP src/g_metacoq_checker.mlg
OCAMLLIBDEP src/metacoq_checker_plugin.mlpack
CAMLDEP src/g_metacoq_checker.ml

Given the call to coqpp here, it does not surprise me that other things are later rebuilt.

Interestingly, calling make ci-local at https://gitlab.com/coq/coq/-/jobs/1316144597#L1840 invokes ./configure.sh local and the corresponding cleaning a second time.

The next suspicious line is https://gitlab.com/coq/coq/-/jobs/1316144597#L1908 where you call cp src/template_coq.cm* build/ and then a few lines down call cp gen-src/metacoq_template_plugin.cm* build/ --- if I recall correctly, cp does not preserve timestamps, and you want cp -a or similar for that.

Then the second call to ./update_plugin.sh https://gitlab.com/coq/coq/-/jobs/1316144597#L1938 says "Renaming extracted files", but nothing after that.

The next lines that I'm not sure about are at https://gitlab.com/coq/coq/-/jobs/1316144597#L1970 :

echo "Done extracting the safe checker, moving extraction files!"
Done extracting the safe checker, moving extraction files!
./clean_extraction.sh
Cleaning result of extraction
Extraction up-to date

Then finally at https://gitlab.com/coq/coq/-/jobs/1316144597#L2011 the call to make -C test-suite results in

make[4]: Entering directory '/builds/coq/coq/_build_ci/metacoq/test-suite'
COQDEP VFILES
COQC theories/MyPlugin.v
COQC bug1.v
[...]

Most suspiciously, at make clean is invoked at https://gitlab.com/coq/coq/-/jobs/1316144597#L2320 seemingly via the call to make all test-suite TIMED=pretty-timed at https://gitlab.com/coq/coq/-/jobs/1316144597#L1889 . What's up with this?

Does this provide enough trail-heads to diagnose what's going on?

yforster commented 3 years ago

I can't explain the "file not found" problems, and I don't understand why the test-suite is compiled a second time. We should probably also use cp -a.

But I can explain why make clean is called: For some reason, the ci-local target on master and only on master contains a make clean. Probably that's just wrong, and good chances that's the real problem we are looking at here.

The commit is here: https://github.com/MetaCoq/metacoq/commit/dcf7c546777cb8c248fbd54eb2ae289de93b8024

@mattam82 do you recall your reasons back then?

yforster commented 3 years ago

I pushed a fix for the cleaning issue and the cp -a issue, maybe

./configure.sh local
make .merlin
make ci-local-noclean
make install

now works.

Does it makes sense to include make .merlin also in the MetaCoq CI? Then for Coq's CI it would only be make ci-local-noclean && make install.

JasonGross commented 3 years ago

Using the ci-local-noclean target fixes this issue to my satisfaction, as the bug minimizer now supports metacoq; thanks! I'll let you close this issue at your leisure, since I can't tell if you're using it to track some change you're still planning to make.