Open vogler opened 3 years ago
Now that this issue exists, I'll write down one thought. Maybe we could just use a GitHub Actions self-hosted runner for this: https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners. I haven't looked into it but it looks like it already has a builtin job queue system etc, so it would avoid a lot of reinventing of the wheel.
Each workflow run is limited to 72 hours.
This limit should be sufficiently high that we can run big jobs that the free GitHub hosted runner probably doesn't allow.
Also GitHub Actions can schedule jobs a la cron instead of trying to do them on each push: https://docs.github.com/en/actions/reference/events-that-trigger-workflows#schedule. And there looks to be even a way to manually trigger jobs.
And of course the integration would be minimal: no need to build some properly authenticated HTTPS webhook server to handle GitHub hooks into testing-framework or whatever.
Does it make sense to look at something like https://www.jenkins.io/ or do we make our own?
A simple implementation would probably be some nodejs server as an endpoint reacting to the GitHub commit hook. There are libraries for job queues with priorities and web-interfaces: https://github.com/Automattic/kue, https://github.com/OptimalBits/bull
Now that this issue exists, I'll write down one thought. Maybe we could just use a GitHub Actions self-hosted runner for this: https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners. I haven't looked into it but it looks like it already has a builtin job queue system etc, so it would avoid a lot of reinventing of the wheel.
Each workflow run is limited to 72 hours. This limit should be sufficiently high that we can run big jobs that the free GitHub hosted runner probably doesn't allow.
Also GitHub Actions can schedule jobs a la cron instead of trying to do them on each push: https://docs.github.com/en/actions/reference/events-that-trigger-workflows#schedule. And there looks to be even a way to manually trigger jobs.
And of course the integration would be minimal: no need to build some properly authenticated HTTPS webhook server to handle GitHub hooks into testing-framework or whatever.
Ok, that looks like an easy option. Just need to make sure the limits are fine for the selected benchmarks:
Each job for self-hosted runners can be queued for a maximum of 24 hours. If a self-hosted runner does not start executing the job within this limit, the job is terminated and fails to complete.
If we do our own, there'd be no limits and one could think about more sophisticated prioritization strategies. What's the GitHub behavior? Start if nothing is running, ignore following commits until run is done and then start accepting again? Ideally you'd have the same, but then start bisecting on idle if there are changes above a certain threshold.
If we do our own, there'd be no limits and one could think about more sophisticated prioritization strategies.
I would be very cautious of trying to roll something decent from scratch. If we really need something beyond those limits, then it still might be worth looking at Jenkins or something else existing and mature. For example, Jenkins even seems to have a plugin for bisecting. Although I'm not sure how necessary such functionality would be. If we already do nightly benchmarks, then there's probably not that much to bisect. And even if there is a need, one can bisect a single/handful of benchmarks locally by hand instead of having to do bisect with an entire 12h suite or whatever.
Yea, just some greenfield thinking, but likely the devil is in the details 😄 Bisect is also good for looking back to see what changes had a big (unexpected) impact.
Moved it over here, as it seems more appropriate here.
We now have a minimum working version of this running on server01
and reporting to Zulip.
GitHub Actions are fine for running the regression tests, but we also want something to track performance (and precision) for long-running benchmarks.
Originally posted by @michael-schwarz in https://github.com/goblint/analyzer/pull/234#issuecomment-850362116:
Something along these lines was supposed to be the outcome. Basing it on this benchexec framework has the advantages that it is the same setup for SV-Comp so all those tests work out of the box and our own tests can be integrated without too many issues. Also this
tablegen
tool would in theory give us a nice diff of what changed between runs (or configurations) that could simply be served at some URL to look at the results without having to ssh to the machine.One probably wants some glue code so that this is not all shell scripts but a bit more robust. But the idea was exactly this.
This is a bit optimistic given that one of these runs will likely take >12h (at least for SV-Com) even on the new hardware.