cssh-rwth / autograde

Test jupyter notebooks in an isolated environment
MIT License
9 stars 4 forks source link

autograde

autograde test autograde on PyPI

autograde is a toolbox for testing Jupyter notebooks. Its features include execution of notebooks (optionally isolated via docker/podman) with consecutive unit testing of the final notebook state. An audit mode allows for refining results (e.g. grading plots by hand). Eventually, autograde can summarize these results in human and machine-readable formats.

setup

Install autograde from PyPI using pip like this

pip install jupyter-autograde

Alternatively, autograde can be set up from source code by cloning this repository and installing it using poetry

git clone https://github.com/cssh-rwth/autograde.git && cd autograde
poetry install

If you intend to use autograde in a sandboxed environment ensure rootless docker or podman are available on your system. So far, only rootless mode is supported!

Usage

Once installed, autograde can be invoked via theautograde command. If you are using a virtual environment (which poetry does implicitly) you may have to activate it first. Alternative methods:

To get an overview over all options available, run

autograde [sub command] --help

Testing

autograde comes with some example files located in the demo/ subdirectory that we will use for now to illustrate the workflow. Run

autograde test demo/test.py demo/notebook.ipynb --target /tmp --context demo/context

What happened? Let's first have a look at the arguments of autograde:

The output is a compressed archive that is named something like results-Member1Member2Member3-XXXXXXXXXX.zip and which has the following contents:

Audit Mode

The interactive audit mode allows for manual refining the result files. This is useful for grading parts that cannot be tested automatically such as plots or text comments.

autograde audit path/to/results

Overview autograde on PyPI

Auditing autograde on PyPI

Report Preview autograde on PyPI

Generate Reports

The report sub command creates human readable HTML reports from test results:

autograde report path/to/result(s)

The report is added to the results archive inplace.

Patch Result Archives

Results from multiple test runs can be merged via the patch sub command:

autograde patch path/to/result(s) /path/to/patch/result(s)

Summarize Multiple Results

In a typical scenario, test cases are not just applied to one notebook but many at a time. Therefore, autograde comes with a summary feature, that aggregates results, shows you a score distribution and has some very basic fraud detection. To create a summary, simply run:

autograde summary path/to/results

Two new files will appear in the result directory:

Snippets

Work with result archives programmatically

Fix score for a test case in all result archives:

from pathlib import Path

from autograde.backend.local.util import find_archives, traverse_archives

def fix_test(path: Path, faulty_test_id: str, new_score: float):
    for archive in traverse_archives(find_archives(path), mode='a'):
        results = archive.results.copy()
        for faulty_test in filter(lambda t: t.id == faulty_test_id, results.unit_test_results):
            faulty_test.score_max = new_score
            archive.inject_patch(results)

fix_test(Path('...'), '...', 13.37)

Special Test Cases

Ensure a student id occurs at most once:

from collections import Counter

from autograde import NotebookTest

nbt = NotebookTest('demo notebook test')

@nbt.register(target='__TEAM_MEMBERS__', label='check for duplicate student id')
def test_special_variables(team_members):
    id_counts = Counter(member.student_id for member in team_members)
    duplicates = {student_id for student_id, count in id_counts.items() if count > 1}
    assert not duplicates, f'multiple members share same id ({duplicates})'