Investigate Fuzzing - Githubissues

For whoever cares: I have atheris set up as a management command on https://github.com/He3lixxx/EvaP/tree/fuzzing, based on the following ideas:

Run on development data in the VM
Pick one of the url patterns from urls.py
Pick one method from [GET, POST, PUT, HEAD, DELETE]
Pick one of the test users from https://github.com/e-valuation/EvaP/wiki/Test-Users.
Pick whether staff mode is enabled (ignored if user is not allowed to enter staff mode)
Fill URL parameters with strings, numbers, or UUIDs, as required by the pattern
Pick parameter data length (4 bytes) and then pick up to that many bytes as data. This will be used as GET or POST parameter
Build a request from all that, check that the response code is expected (200, 30X, 40X)

It reaches around 40% to 45% coverage without any special input, testing around 20 requests per second.

One limit it reaches is that many views begin similarly to this:

def my_view(some_instance_id: int):
    instance = get_object_or_404(SomeModel, id=some_instance_id)

and since the IDs that would work here are only known to the database, the fuzzer struggles in providing IDs that don't result in 404s.

Currently considering two approaches for that:

Build dictionaries with correct IDs. Problem: Some views are like semester/X/course/Y/evaluation/Z and only the correct combination will work. With dictionaries, we'd still have a way-too-high count of options.
Build a corpus that would actually reach high coverage and that the fuzzer can then mutate. This could be somewhat automated by logging requests and reversing these requests to the corresponding corpus entry for the fuzzer. This would instantly give a high coverage, and the fuzzer could very effectively try mutations, but it's a bit more compilated to achieve.

e-valuation / EvaP