Grading in Dodona - Githubissues

niknetniko commented 3 years ago

Thoughts/proposal for a first version:

Terminology

Evaluation: existing term in Dodona.
Feedback: existing term in Dodona.
Submission: existing term in Dodona.
ScoreItem: the things we want to give a student a score on. Proposed is to make this a tree structure, with the root of the tree being the grade item for an exercise. It is then possible to construct a tree to reflect how the scores should be assigned: x points for function A, y points for function B, etc. This is up to the evaluator. (Every ScoreTemplate would be linked to an EvaluationExercise in Dodona) This is a list of items for each exercise.
Score: A single score: the link between a ScoreItem and a Submission (and thus student). (A Score links a ScoreTemplate to a Feedback in Dodona)

Flow

The intended flow of grading would be like this:

The evaluator creates an evaluation for a series, as is done now.
After selecting the relevant exercises and users, there is a third step. In this step, the evaluator must specify the ScoreTemplates per exercise. This can be very simple (e.g. x/10 for each exercise), but a more detailed tree is possible. Automating this is difficult, because an exercise has no notion of the result that will be produced by the judge. We could allow a grading suggestion in the config of the exercise though (e.g. if you author an exercise with two functions, you can suggest splitting the grading into two grade items).
When actually doing the evaluation, the evaluator can only mark a submission as evaluated if each leaf in the ScoreTemplate tree is filled in. We'll need to think were we put this: above or below the submission code, or in a new tab (next to the "code" tab).
When the evaluation is released, the students will be able to see their grades. (Again, perhaps the total for a submission can be above the code, while the ScoreTemplates are displayed in a new tab.)

Future

Things that are not for this issue, but can be done later if we determine the basic version is deemed OK

LTI integration

In LTI, there are some different concepts:

A line item is usually a column in the gradebook, and holds the result of a specific activity.
A result is a single cell in the gradebook; it is unique for the line item and user. (read-only, reading scores from the platform)
A score is the latest score a student has, together with a status (write-only, sending scores to the platform)

Some problems need investigating:

How should we handle the case of not all students being in tyhe same series, e.g. in the course Programming? Ideally, Dodona would "group" these series in a way, allowing us to publish line items like "Question 1" for a certain evaluation, even if the "Question 1" are different exercises.
It is possible to link line items to resources. However, the resources are not necessarily the same: e.g. we link to a series in Ufora (so the series is the "resource"), but the line items are the "exercises" in the series.

Penalties

Another useful addition would be the concept of penalties: annotations with an assigned score that is deducted from the total score for an exercise.

The main difference with the scenario where the evaluator fills in the scores is that here, each student starts with the maximum available points, and the annotations are subtracted from that score. (e.g. 10/10 for a function, but -1 penalty because of "break/continue" => score of 9/10).

This will need some thought on how both modes can be integrated together, and how the penalties act when the evaluator wishes to override the final score. One possibility is allowing a custom start score (e.g. the student starts with 9/10, from which penalties are deducted). This might feel arbitrary to students though (e.g. why do I have 7/10, when I only have two penalties of -1 each?). Another option is to disallow mixing both modes: either the student starts with 0 and the evaluator fills in the score, or the students starts with the max score and the evaluator must use penalties to deduct points.

pdawyndt commented 3 years ago

First reflections after reading the proposal

agree to start with a minimal version
why not ScoreItem instead of GradeItem (or Grade instead of Score) to make it more consistent?
do we need a tree to start with? just some ScoreItems and an (automatically computed) total score might be sufficient for a first iteration
ScoreItems might have several properties; they should not all be there to start with, but just a list of things that come in mind
- type:
- range (minvalue, maxvalue, int/float): would only start with this and no minvalue (always 0 as a default)
- enum or dict (tags mapped to scores)
- computed (formula to compute score from partial scores)
- visible: is this score item visible for students after publishing the feedback (maybe we want to hide some subscores, only show top-level scores)
I guess the scoring system will initially be fixed after creating an evaluation? afterwards we might consider what the options are to make changes: what should be done with scores that are already assigned?

niknetniko commented 3 years ago

why not ScoreItem instead of GradeItem (or Grade instead of Score) to make it more consistent?

This was because with "ScoreItem" and "Score" it isn't obvious to me which is which, but I agree that "GradeItem" doesn't solve this. Perhaps a better name would "ScoreBlueprint" or "ScoreDistribution" (from 'puntenverdeling').

do we need a tree to start with? just some ScoreItems and an (automatically computed) total score might be sufficient for a first iteration

We don't really need the tree, but:

Either we don't have a ScoreItem for the automatically computed total, but this also means we cannot easily apply the same options to it (e.g. the visibility and type).
Or we do have a ScoreItem for the total, but then we also need some notion of parent and child ScoreItems, and then we basically have a tree.

(Of course, this tree structure is internal to Dodona, we don't have to expose this: the UI can look like some ScoreItems per exercise, with a computed total)

I guess the scoring system will initially be fixed after creating an evaluation? afterwards we might consider what the options are to make changes: what should be done with scores that are already assigned?

A "simple" solution would be to unmark all completed submissions as done, meaning they would need to be evaluated again, which is probably OK, since I assume if you change the scoring system, you'll need to update the scores

pdawyndt commented 3 years ago

The latter might result in a lot of work being lost. We might start with a fixed scoring scheme and then later relax some changes. There are the obvious changes that don't really impact the scores given:

rename score item
reorder score items

Rather than "throwing away" scores given before changing the score system, we might keep the scores given, but undo the "completed" status of their review so they need to be reviewed again (with the old scores still in place). Here's some actions that might be considered:

change score distribution: I guess this is one of the changes that occurs most often; maybe we could "highlight" the impacted scores, as the changes might not impact all score items
add score items: seems like a variation on redistribution; adding an scores item might set score 0 as a default score and flag all score items as "review again";
remove score items: seems like a variation on redistribution; we might set the removed score item as "read only" so the old score given is still recorded (e.g. if we split that score item in two new score items, so we could make sure the total of both scores is the same as the old score)

pdawyndt commented 3 years ago

@niknetniko these might be interesting to read through for inspiration (models and terminology):

dodona-edu / dodona

Grading in Dodona #2401

Terminology

Flow

Future

LTI integration

Penalties