alexpatel commented 7 years ago

I'm submitting this as a proposal for a 2-3 week project for me to take on (so, due around 6/26). I think it's a good chunk of work, it is something I have experience doing quickly at scale, and I claim it's an important part of managing the development lifecycle of a project this size/duration/complexity and should be dealt with at the onset. I would do this alongside any paricularly pressing conference paper tasks that @mwookawa needs, but would hope to focus on this.

Goals

Reach Goal: Automate the Barrelfish OS/Guppy testing infrastructure Settle Goal: Make it really easy to hack/do research with/test Barreflish OS for PRINCESS

Related Issues

Components

~~strike~~=done, bold=in progress

~~Help with/take over if needed take inventory of ETH's working machines list (#41) and take inventory of ETH's working barrelfish test list once it is sent (#40) .~~
Write design document on wiki for development practices (using 6/9 meeting result) and build/testing/automation infrastructure proposal, get sign-off by senior folks.
~~Read/hack around with barrelfish/tools/harness.~~
Built test list (what passes/fails) for all available virtual platforms (QEMU, Gen5, see inventory..)
- get ARM Virtual Platforms trial/license.
Make buildbot/docker demo for group meeting 6/15.
Evaluate and choose container/virtualization software to use (Docker plz...).
Evaluate how many of the tests can be run automatically, how much work it will take to automate Guppy relevant tests (David says impossible, I say we can probably get more of it than one would expect).
Write automation pipelines for building and testing Barrelfish pipelines on whatever virtual machines we have available at the time.
Help with Nightly builds (#21).
Help with getting all various physical testing machines we will use up and running.
Built test list (what passes/fails) for all available on-premise machines.
Write automation pipelines for building and testing Barrelfish pipelines on whatever physical machines we have available at the time.
Investigate whether it is reasonable (given compile time, etc.) to set-up CI triggers from Github.
Write documentation for everything.

Potentially upstreamable work

In the harness documentation her is what ETH is saying is missing from their testing code:

Better support for multiple architectures.
Better support for processing results, plot scripts etc.
Better error handling (don't blow up in a backtrace when subprograms fail)
Parallel tests/builds
- Add harness documentation for "Defining new machines, builds, and tests" to Barrelfish for upstream patch.

Software to try

These are the tools that right now I would consider sufficient to get a solid automated infrastructure going, save any really involved/manual tasks I don't know about or have overlooked.

Jenkins

Jenkins is a continuous integration/automation server that can be deployed on-premise. A cloud-based equivalent (and free for public projects) is CircleCI.

Docker/Containerization

I'd like to explore more traditional virtualization options in addition to newer software like Docker or rkt. As a little demo for the kinds of software I'd like to explore (for those who don't hate/would be willing to try Docker), I've put together a Docker image for Barrelfish for x86_64 on QEMU/16.04.

To try it, install Docker and run in the shell:

git clone git@github.com:alexpatel/docker-barrelfish-qemu-x86_64.git
cd docker-barrelfish-qemu-x86_64
docker build --rm -t bf_x86_64 .
docker run bf_x86_64

Any Docker alternatives of course would be considered, but I would offer Docker for the candidate list - it makes it very easy to build minimal replicable images (and afaik much of what we need (e.g. hypervisors) can target GNU/Linux); although sometimes painful, Docker has really been a huge productivity boost. It becoming more stable and a real hoot to use!

margoseltzer commented 7 years ago

I totally love this idea: I was planning on trying to subtly twist someone’s arm to take this on, and now I feel that I don’t have to arm twist!

Margo

On Jun 12, 2017, at 7:38 AM, Alex Patel notifications@github.com wrote:

Assigned #43 to @margoseltzer.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

mwookawa commented 7 years ago

Jenkins! Yes!

Not crazy about docker, but we can talk about alternatives. Similarly, qemu for arm is not great. As mothy said, neither qemu nor gem5 are particularly accurate. We can build up from the arm fixed virtual platform simulators that we are going to ask for budget for instead.

Anyway, regardless of nitpicky tool choices, this is great. I was thinking along the same lines as margo. Huge thanks for taking this on alex! On Mon, Jun 12, 2017 at 7:46 AM margoseltzer notifications@github.com wrote:

I totally love this idea: I was planning on trying to subtly twist someone’s arm to take this on, and now I feel that I don’t have to arm twist!

Margo

On Jun 12, 2017, at 7:38 AM, Alex Patel notifications@github.com wrote:

Assigned #43 to @margoseltzer.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/Harvard-PRINCESS/Guppy/issues/43#issuecomment-307766226, or mute the thread https://github.com/notifications/unsubscribe-auth/ABceTI1OLe-mIGuHLSBGF9KfMbLuliO7ks5sDSUDgaJpZM4N2_li .

alexpatel commented 7 years ago

Perfect, thanks Ming and Margo!

I will use this as the master issue and add references in the Components section any sub-issues I create (+ Github's labels to tag the entire set.
Any high-level issues/progress will be posted as comments to this issue.

David, I think you should probably expect a fair amount of bother from my end in the process of a knowledge transfer about kernel testing and development practices - I'd love to understand more about what we were discussing last week.

ghost commented 7 years ago

Wait, what did I say was impossible? Automated kernel testing is a pain in the neck and a lot of work, and automated testing on bare hardware is another pain in the neck and a lot more work, but there's a long way from that to flatly impossible.

The place to start is booting barrelfish in qemu with serial console and wrapping that with expect so that you can type in commands and collect the output for analysis. See for example how this works: http://www.gson.org/netbsd/anita/ but note that all it really does is automate interacting with an installer.

Then figure out how to get a success/failure reading on the output; inspecting it by hand is a starting point but (a) you'll miss stuff and (b) you'll get tired of doing it every time a new batch of test results comes in. Diffing the test outputs against reference versions is the easiest thing in general (and is all you generally need for e.g. compilers) but this tends not to work for kernels because of timing and output interleaving. Sometimes you can filter the output enough to make it ok; sometimes not.

(Running the tests on bare hardware is similar except that you need physical serial consoles and console servers, a way to deploy a new kernel and userland for testing, a way to restore/recover the test machine if the new kernel sucks, and (in general) the hardware-level ability to cycle each test machine's power.)

This is the expensive stuff; it's premature in this context to worry about whether to run qemu inside docker or not (which seems pointless to me, but whatever) or whether to use Jenkins or cron to fire off test runs, or other things like that.

although probably the right first step before any of that is setting up a nightly build run.

ghost commented 7 years ago

Oh and since ETH has done this, as I said somewhere else we may be able to steal^W copy a bunch of their software infrastructure.

alexpatel commented 7 years ago

Cool - maybe tough, but hopefully I can get a good improvement over the automation achieved at ETH (even just thinking back to trying to script OS/161 tests though...)

So something maybe like: bf os < qemu -- serial port -- python client < ubuntu < ...vm/container.. < automation server? I can imagine writing OOP Python that allows for other emulators like ARM etc.

alexpatel commented 7 years ago

@dh6713 wiki.barrelfish.org/Harness (tools/harness in the source) appear to be what ETH uses internally - I am drafting an e-mail for them tomorrow re: test suite and pipeline

alexpatel commented 7 years ago

On the ahp_buildbot_demo branch, I have made a Docker configuration that when run will locally build and run the BuildBot automation server on a cluster of containers (postgres, 1 master buildbot, 1 slave buildbot) and track commits to the Guppy repository on my branch.

The next step is to hook up BuildBot to the Barrelfish QEMU/x86 Docker image and teach it to run the tests provided by barrelfish/tools/harness when it sees a new commit. I will also make a powerpoint slide explaining how if we went with Docker and AWS we would be able to test/manage our virtualizable and on-premise hardware as part of the same automation server cluster.

How to replicate: install Docker (I've only tested on Docker for Mac), checkout branch ahp_buildbot_demo, run cd tools/harness_auto, run docker-compose up, then go to http://localhost:8080/#/changes in your web browser. If you commit a test commit to my branch and push to Github, it will reflect in the web UI.

screen shot 2017-06-15 at 6 38 37 pm

mwookawa commented 7 years ago

we talked today about sticking to nightlies for now, rather than every commit and to only build and test a few platforms: FVP_VE_CORTEX_A9x1 (which is free as part of DS-5 Community Edition), QEMU x86, and Pandaboard (build the image only).

the common tests are the only things that can currently be executed, and have to be piped into the kernel terminal via a virtual serial. branch dev_integrate_lmbench_as_regressions will add a few more, and i'll pull more in as PRs if i think of little things (some subset of coreutils maybe?)

either nomnomnom or grizzly can host the scripts and daemon; grizzly is less utilized right now, and no one has claimed it as a desktop, meaning you can still wipe it freely if you need to, so it's probably the better host.

alexpatel commented 7 years ago

Pull request with nightly build configuration is https://github.com/Harvard-PRINCESS/Guppy/pull/60. The other chunk of work here is architecture-specific testing, interacting with the kernel terminal automatically.

I have some basic Docker/BuildBot configuration that I was hoping to pursue in this ticket, but there are other PRINCESS tasks that are more pressing. I'll just work on it in my free time with the hope of getting a demo of some scalable/automated CS161 testing infrastructure...

mwookawa commented 7 years ago

a buildbot UI would be great when we get there. looking at the environment needed to run FVPs, it might make quite a bit of sense to dockerize them. there is some platform specific code that compiles much more reliably with armclang than gcc-arm-none-eabi or clang --arch=armv7

the only thing in the way of pandaboard execution on grizzly is that the reset contacts on the physical board need to be shorted to wipe ram between runs. i found some handy tinned pads on the board that can be wired to an arduino to enable remote reset.

come to think of it, we could theoretically use the ATX reset pins on x86 motherboards to cause a PXE or TFTP re-fetch remotely instead of trying to find a machine with serial terminal.

alexpatel commented 7 years ago

a buildbot UI would be great when we get there. looking at the environment needed to run FVPs, it might make quite a bit of sense to dockerize them. there is some platform specific code that compiles much more reliably with armclang than gcc-arm-none-eabi or clang --arch=armv7

Perfect - I'll quickly throw the Docker containers up on nomnomnom in the morning for the barrelfish user, anyone can mess around with BB jobs/pipelines if they want (and if there's down time in the future I can configure it with the nightly script...)

the only thing in the way of pandaboard execution on grizzly is that the reset contacts on the physical board need to be shorted to wipe ram between runs. i found some handy tinned pads on the board that can be wired to an arduino to enable remote reset.

come to think of it, we could theoretically use the ATX reset pins on x86 motherboards to cause a PXE or TFTP re-fetch remotely instead of trying to find a machine with serial terminal.

+1

ghost commented 7 years ago

If you want actual testing, you need a real serial console, because you've got to log what happens, and in particular what happens when the kernel panics.

That said, one of the things we should also have running periodically, if possible, is a copy of the LL testing harness stuff.

alexpatel commented 7 years ago

I am going to close this ticket - we are at the stage where we have with regards to testing and development tools to do the hacking needed for this first paper/project:

Circle CI - continuous integration for build and test harness for QEMU/x86, runs on every commit to any branch and e-mails result to author
docker - build/run barrelfish on QEMU/x86 in an isolated virtual container (instead of ssh'ing into a physical machine)
tools/harness/runtests.py - run an architecture-specific list of tests against a kernel, build if necessary
tools/princess_nightly - nightly build script that will verify compilation of master/dev HEADs, running on nomnomnom.seas.harvard.edu

The upcoming work on this can just be put in other issues, because a lot depends on when physical hardware is coming in. However, the CI pipeline is set up so that new machines/platforms (both virtualizable and on-premise) can be added to the CI suite fairly easily.

Write PandaboardES server/client so that Pandaboard builds can be run by Circle CI
- make the arduino to be able to programmatically reset the PB
FVP armv7 (has to be run on an on-premise machine)
QEMU/armv7 (doesn't boot currently, will need to fix in the barrelfish trunk)
(once we have a port) Intel Arria 10 FPGA dev kit.

Harvard-PRINCESS / Guppy

Build development/testing infrastructure #43

Goals

Related Issues

Components

Potentially upstreamable work

Software to try

Jenkins

Docker/Containerization