[WIP] Change test generation format

betatim commented 5 years ago

This lets us assign points to each block of tests and have more than one test in a cell. Heavy construction work still happening.

Not sure if this is suitable for nbclean as it is getting more specific but I didn't have a better place to put this so parking it here for the moment. Could be useful for the general public as there isn't anything quite like this out there I think.

This PR is friends with https://github.com/earthlab/grading-workflow-experiments/pull/11 this repo needs a better name and cleaning up to become a suite of authoring and grading tools.

choldgraf commented 5 years ago

cool - thanks for the heads up.

Have you heard about Gopher Grader? https://github.com/data-8/Gofer-Grader

This is what Yuvi's "gradememaybe" project has morphed into. It's now being maintained by a grad student @ berkeley working with Yuvi, but might be worth keeping an eye on moving forward in case some of these things make sense there

betatim commented 5 years ago

The setup in grading-workflows-experiments takes parts from gradememaybe/gofer grader.

Right now I decided to not use their approach of extracting the source code from a notebook and then executing it "by hand" (for loop over all cells with exec() calls). I had some trouble getting that to work nicely when some of your notebooks contain matplotlib figures and some don't. It seems you have to run pythonw grade-it.py <notebook.ipynb> instead of python grade-it.py <notebook.ipynb> on a Mac to sort out matplotlib backends. It seems simpler to use papermill to execute the notebooks (it for free gives me fully compliant notebook execution).

I reused the idea of looking at the parsed source of the notebook to detect people trying to redefine the check function though.

The main deviation from okpy, and where this starts being incompatible, is that we want to be able to assign points to individual cases in order to weight things differently while having multiple cases in one cell (to reduce the visual clutter from having lots of cells that are just check() calls).

choldgraf commented 5 years ago

sounds good...maybe it'd be helpful if you wrote up your high-level vision for what a grading system should look like. I'm personally a bit distanced from this problem since I'm not working on any of the grading stuff (just trying to make sure everybody knows about everything else haha). So I can merge this in, though I probably can't give super useful critical feedback on design etc. Is that OK w/ you? :-)

betatim commented 5 years ago

What would be a good place to write that up?

This specific change here is motivated by wanting to have several tests in one notebook cell. Each test consists of one or more lines of doctest. If the whole block passes you get N points. If any of it fails you get zero points.

So this would be one notebook cell:

# POINTS 2
>>> student_func(3, 4)
12

# POINTS 4
>>> student_func(3, -4)
-12
>>> student_func(-3, -4)
12

which has two tests in it. The first one is worth 2 points and the second one is worth 4.

Before this PR you'd have to make one cell for each of the two tests and there is no way to specify that they should be worth different amounts of points. Imagine you wanted to grade a plot and wanted to give one point each for x axis label, y axis label, title, legend and marker type. You'd end up with five cells cluttering up the notebook and you coudln't assign different weights to each one.

The rest of https://github.com/earthlab/grading-workflow-experiments deals with authoring, distributing, collecting and grading student work. It uses nbclean for the authoring.

choldgraf commented 5 years ago

Ah that's cool - re: writing it up, I was just thinking a blog post or a hackmd or something. Or maybe this should just be docs for nbclean? I dunno

choldgraf / nbclean

[WIP] Change test generation format #14