Harvard-PRINCESS / Guppy

A very adaptable fish.
Other
1 stars 3 forks source link

write regression tests against cpu driver and libbarrelfish interfaces #92

Open mwookawa opened 7 years ago

mwookawa commented 7 years ago

i know it's not exciting to do so, but we can't claim that our tool passes validation tests without having validation tests.

alexpatel commented 7 years ago

This is very large surface area to cover (basically the entire kernel??) - I think this would be a much more tractable task if you could provide either a design document specifying which functionality you are going to capture with Alewife (so that we can go write regression tests against the current version of that functionality) or even just an enumerated list of the behavior you want to check (cap_delete has such and such pre/post conditions).

Otherwise I would suggest we come up with a good timebox of this task (1 week, for example) as, if my code reading is faithful, this could take like months of work to write tests for this kernel - there are not any unit tests that I am of aware of (Crystal confirms).

mwookawa commented 7 years ago

ask yourself: what is most likely to break if we were to rip out all the MD syscall front-end, fishcall front-end and capability retyping code and replace it with our own handwritten code that we hope does the same thing. how can we use tests to verify that our handwritten code does more or less the same thing?

note that we have already refactored/replaced the MD syscall front-end, the fishcall front-end, and have been looking at how and where capabilities are manipulated in LB and the CPU driver. i would be shocked if we didn't miss any corner cases. write test that exercises those corner cases, and hopefully even fail with just the changes we've made so far.

finally, write a basic set of userlevel tests that verifies rudimentary operation of the system. can it run rogue? a fish (fork)bomb? standard functional and nuclear memory allocation tests? can we use LB to construct a memory hierarchy for armv7, armv8 and x86 out of caps using the provided MI kernel functionality?

will it be particularly efficient if you had a roadmap of exactly which tests to write? sure. if you gear down and start writing test cases now that may not exercise parts of the kernel we are immediately going to replace, will that be wasted time? no. if it tests kernel functionality, we are eventually going to break that functionality and cause that test to fail. as with any other test infrastructure, we want to know when we break it as soon as possible. capiche?

alexpatel commented 7 years ago

ask yourself: what is most likely to break if we were to rip out all the MD syscall front-end, fishcall front-end and capability retyping code and replace it with our own handwritten code that we hope does the same thing. how can we use tests to verify that our handwritten code does more or less the same thing?

Fair enough - I wasn't trying to be aggressive, it's just that "to rip out all the MD syscall front-end, fishcall front-end and capability retyping code and replace it with our own handwritten code that we hope does the same thing" is the first official writing I have seen (maybe I missed a wiki page or e-mail something, but the relevant tickets #82, #45 seem to have gone stale) regarding the scope of Alewife compiler and what it generates. I'm sure you would agree that being assigned a ticket that is "write unit tests for this OS, both the micro-kernel and user libs" can be concerning.

What you've outlined (test what syscall changes Crystal made, what cap changes have made it into trunk, cap retyping functionality) confines the scope of this drastically, seems reasonable to me to do in 1-2 weeks of time across 3 programmers.

The follow-ups I have are:

alexpatel commented 7 years ago

Also - the issues I have assigned related to this ASPLOS paper candidate are:

@ming do you have a mental model of the preferred order of precedence of these tasks? Like which are most immediately useful to what you're designing, which are just good to haves, etc.

ghost commented 7 years ago

We need unit-test-level tests for the specific fragments we're going to be synthesizing and replacing for the eval, but mostly not others. I'd say it's probably best to avoid spending a long time writing unit tests for code that isn't in this category, so probably we should tentatively choose two or three specific such fragments first and concentrate on those.

My standard spiel about unit tests: tests in general are great, but every test you have takes (a) time to write, (b) time to run, and (c) time to maintain/update as the system evolves, and they only pay back when they catch something. Unit tests in general have a very poor results-to-effort ratio, for various reasons (ask me in person if you want to hear more) and when there's generally about 10x as much work to do as time to do it (that is, in nearly any software project) it's usually more effective to spend your testing budget on other things.

Also, unit-testing kernel code has a way of being expensive. On one project some years back we had a file of kernel code test stubs that had ~100 commits whose commit message was "I hate this file", and all it supported was the most basic infrastructure code. Meanwhile this whole project was basically floated to be able to test kernel code in ordinary userspace.

That said, there should be more general tests and they should probably get done nicely so they can get upstreamed. There was talk of getting lmbench running, but the ticket hasn't been updated in a month; has anyone been working on that? And it would probably be fairly straightforward to get (some of) the OS/161 tests working... though many of them are probably too unixy.

Is there a rogue for BF already yet? I'm guessing not. Is there curses?

mwookawa commented 7 years ago
  1. don't rewrite mackerel in ocaml. if you're not done with the a mackerel parser (much different), then push what you have into that branch and let it go.
  2. make an appointment with HBS.
  3. what crystal wants, she gets.

see dholland's notes regarding unit tests vs. module tests

alexpatel commented 7 years ago

Ok, sounds good. Mackerel parser is on ahp-alewife-mackerel

ghost commented 7 years ago

Also, while we're talking about testing, does anyone know of a decent code coverage tool? gcov exists but is not really very usable.

mwookawa commented 7 years ago

to recap a quick conversation between david and i the other day, llvm-cov is pretty fantastic these days, and there is a cool valgrind plugin called callgrind that seems worth looking into.

On Fri, Jul 28, 2017 at 8:14 PM dh6713 notifications@github.com wrote:

Also, while we're talking about testing, does anyone know of a decent code coverage tool? gcov exists but is not really very usable http://gnats.netbsd.org/44188.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Harvard-PRINCESS/Guppy/issues/92#issuecomment-318788874, or mute the thread https://github.com/notifications/unsubscribe-auth/ABceTB3jwdW0mdzJ9M6VT3h3HL4U1YZIks5sSnl-gaJpZM4Oi2cC .