Closed swergas closed 6 years ago
Thank you!
I followed your instructions (in a fresh Debian 9 virtual machine) to run the CI job locally with the tip of your gitlab-ci
branch and it timed out after 30 minutes. Looking further, this is due to lack of entropy in the runner. Prefixing make all
and make check
with BELENIOS_DEBUG=1
fixes that. Indeed, by default, belenios-tool
uses secure random (/dev/random
), which may exhaust the entropy pool when it is run many times (which is the case with make check
). The BELENIOS_DEBUG
environment variable at build time triggers a different code path that uses /dev/urandom
instead. This way, running the CI job locally only takes 1 minute in my environment.
I agree it's a good idea to prepare a docker image with Belenios dependencies pre-installed. However, I don't think using the hash of opam-bootstrap.sh alone (for the tag name) is right. Indeed, some packages in the OCaml stack (OPAM packages) may evolve and I want to put minimal version constraints. At the moment, only Eliom is constrained. I think the docker image should be re-generated regularly to pick up new versions of unconstrained OCaml libraries. I suggest using $DATE-$HASH
instead of just $HASH
for the tag name. Or we could just use an integer as a version number that we would increment at each docker image generation.
Out of curiosity, I tried with ef0aca2 + my patch adding BELENIOS_DEBUG
and the installation of ocsigenserver fails. Do you confirm that the whole CI job worked with this commit at some point for you? Maybe the failure is new and unrelated to your work.
There is an Inria Gitlab instance, where you should be able to create an account. I'm planning to put Belenios's repository there and set up CI there so that we depend only on Inria infrastructure.
I just tried running gitlab-runner in a new Debian 9.5 virtual machine. Indeed I also had the timeout issue:
ERROR: Job failed: execution took longer than 30m0s seconds
Fatal: Execution took longer than 30m0s seconds
But this issue happened as the docker image was not even finished downloading (my wifi connection downloads this image in more than 30 minutes). So I decided to try again after having downloaded the docker image (docker pull swergas/beleniosbase:efa5df3049f736dd34eb8289da730dd709eb99939f6511fa93ae0080a61ce4fb
). Then I ran gitlab-runner again and everything worked well, and the job succeeded without errors nor timeout. So I cannot reproduce any entropy issue.
About the naming of the docker image, I will think about it and come back to you soon :)
I tried again executing gitlab-runner in this VM, and I have been able to reproduce the timeout issue you encountered. I read some articles about differences between random and urandom and now I understand better why the use of random could slow down things a lot when used extensively. Do you think it is acceptable to use urandom instead of random for Belenios CI tests? I haven't found yet a way to increase this 30 minutes timeout (when executed locally).
I have also been able to reproduce the issue you mentioned about installation of ocsigen in commit https://github.com/glondu/belenios/commit/ef0aca229b996361cebda8d30372faef2df772f7 in my VM. This is strange, I did not have this problem previously (I was installing from a docker image on my machine).
OK maybe I see, image ocaml/opam2:debian-9 has been updated 2 days ago, and now uses ocaml version 4.07.0 instead of 4.06.1 when I ran it. I'm investigating.
OK I tried image ocaml/opam2:debian-9-ocaml-4.06
(which has ocaml version 4.06.1 and opam version 2.0.0), and everything worked correctly. I edited the .gitlab-ci.yml file to show this.
Do you think it is acceptable to use urandom instead of random for Belenios CI tests?
Yes.
I've put Belenios on Inria's Gitlab instance and set up CI. I had to use the BELENIOS_DEBUG=1
trick to make the pipeline pass.
Using a fixed image with (a snapshot of) all dependencies of Belenios preinstalled is a good idea to test changes in Belenios itself. However, it is also a good idea to spot early breakages due to changes in dependencies. For this, I use the opam-bootstrap.sh
script. Is it possible/reasonnable to you to set up a second pipeline that would start with a simple debian:9 image? (as you did in your first commit?) This pipeline would just test opam-bootstrap.sh
, thorough tests (UI...) would still be on the first pipeline.
Yes OK, here it is: https://gitlab.com/swergas/belenios-inria/pipelines (branch: https://gitlab.com/swergas/belenios-inria/commits/gitlab-ci-both-images)
Remaining work:
doc/
swergas/...
Work continues here: https://gitlab.inria.fr/belenios/belenios In particular this PR for CI documentation: https://gitlab.inria.fr/belenios/belenios/merge_requests/1
Remaining work:
.gitlab-ci.yml
instead of image swergas/...
You can see the details about how this script ran here: https://gitlab.com/swergas/swergas-belenios-ci/-/jobs
Full documentation in French is here: https://hackmd.io/MY0_U_omQMGNK52Zws9c7g