add support for renaming exercise directories in a repository

pdawyndt commented 5 years ago

@bmesuere I guess this could be done by

adding identifier of exercise to config file upon intial push
bind exercise identifier to exercise repository, so that identifiers that were assigned outside the repository cannot be hijacked

Original issue by @pdawyndt on Thu Sep 15 2016 at 21:46. Closed by @charvp on Thu Aug 16 2018 at 13:30.

chvp commented 5 years ago

As discussed, this should be fixed by placing the exercise ID (or a unique token) in an instance-specific (hidden) config file (e.g. .dodona.json, .naos.json) in the exercise directory (the same directory as the exercise config.json. When the exercise is moved this extra file should be moved with it, which will allow us to detect the moved exercise. If people copy an existing exercise to create new ones, the same config will be detected. This is not a problem as long as the copied exercise wasn't moved in the same commit. If this happens there is no guarantee we will be able to detect which exercise was moved and which was created. If we use some kind of best guess and we get it wrong, people will still be able to fix this by switching the config files around.

There is only one degenerate case: people could copy an exercise directory (to edit and create a new exercise) and remove the exercise they copied from in the same commit. Dodona will be unable to detect that this was not a move and the submissions linked to the removed exercise will be linked to the new exercise.

Original comment by @charvp on Thu Aug 09 2018 at 17:26.

rien commented 5 years ago

To solve the degenerate case: we could keep track of all these hidden config files in a kind of 'index' which contains a list of all the known hidden config files together with their location and hash. This index should be stored in the database (a serialized JSON column in the Repository table would be enough, since this would only be used when a repository is updated).

Original comment by @rien on Thu Aug 09 2018 at 17:58.

chvp commented 5 years ago

How would this solve the degenerate case? We want to allow people to move exercises in their repository, so a move is indistinguishable from a copy+remove.

Original comment by @charvp on Fri Aug 10 2018 at 08:49.

beardhatcode commented 5 years ago

Move can be implemented as copy+remove, so these things are indistinguishable.

A possibility would be to let the user indicate they want to move the submissions. In the repository view you could add a message:

Detected a moved exercise (intervention needed) "Oldname" ⇒ "Newname" "OldPath" ⇒ "NewPath" I looks like you moved an exercise in the repository. You probably want to move the existing submissions to the new exercise. This will also update all cources that refer to the old exercise [transfer submissions] [No, this is a new ex]

Original comment by @beardhatcode on Fri Aug 10 2018 at 09:00.

rien commented 5 years ago

You're right, using an index would not be able to differentiate between a copy & move and a copy & remove, so manuel intervention is needed (like @beardhatcode suggests).

Original comment by @rien on Fri Aug 10 2018 at 10:32.

chvp commented 5 years ago

After a long discussion with @pdawyndt we devised the following protocol for moving, deleting and updating exercises:

If there are unknown IDs in the new version of the repository, we reject the push and send the user an email that they should either remove the ID (if we don't know the path) or restore the old ID (if we do know the path). Initially we check for unknown IDs within the scope of the repository (in the database), but this could be extended to the entire database scope later if we want to support moving exercises between repositories.
If there are duplicate IDs in the new version of the repository, we reject the push and send the user an email that they should remove the IDs (for paths we don't know) or restore the original IDs (for paths we do know).
If there is no ID for an exercise in a location we already know about, we reject the push and inform the user to restore the ID if they didn't intend to remove the ID, or remove the directory/config.json if they intended to delete the exercise. We also inform them that if they wanted to delete the exercise and immediately create a new exercise in the same location, they need to do this in 2 separate pushes.
If a config.json was deleted, the exercise is marked as deleted in the database. When this happens we also clear the path in database. Users can restore exercises by placing the exercise's ID back in a config.json (either in the same location or a different one).
Every other case is treated as a move if the ID is known and the path has changed, as an update if the ID is known and the path hasn't changed or a new exercise if a config.json without an ID is created.

All checks are done and bundled in the same email to the pusher in case any check fails.

We also propose to not use instance-specific config files (and thus placing the ID in the config.json). This would otherwise only lead to clutter in the repositories and create problems if, for example, the dodona database backup is restored onto naos. Instead, naos' capability to detect moved exercises will be crippled. We are of the opinion that this is not a problem since naos is meant as a staging environment, not a production environment. The ID used is not the actual exercise id, but a random string that will be added as a new column to the database. This should discourage users from trying to manually meddle in this protocol when it is not required.

For "rejecting the push" we propose the following system:

When a webhook is received by Dodona, a git fetch is executed in the repository.
The git diff --name-only master origin/master output is used to validate the exercises in the repository.
- If the validation succeeds, the database is updated with all the new information and a git reset --hard origin/master is executed in the repository.
- If the validation fails, an email with the validation errors is sent to the user and the git reset is not executed. The database is also not updated.

For a few cases we would also send warnings to the user, but not reject their push. For example, when a moved exercise is placed in the same location as an exercise that was deleted in the same update.

Original comment by @charvp on Fri Aug 10 2018 at 17:55.

pdawyndt commented 5 years ago

some additions I recall from the discussion:

tokens are immediately generated as unique across all exercises in the database (proposal: 64 characters)
only a slight change in the protocol is needed to allow migration of exercises between repositories (publishing those requires two pushed anyway, where the repo that originally held the exercise needs to be pushed first, follow by a repo that uptakes the exercise)
exercise tokens and "current commit" of the repository can be made visible to repository admins
before an exercise is marked as deleted we could copy the exercise directory to an internal repository of "dangling" exercises; this way, we still hold the latest version of the exercise before the delete as a reference for submissions that depend on the exercise (if any, otherwise we could delete the actual exercise from the database); if the deleted exercise is restored, we could also remove the directory from the internal repository

Original comment by @pdawyndt on Fri Aug 10 2018 at 18:24.

beardhatcode commented 5 years ago

How is a new UUID created for a new exercise?

Original comment by @beardhatcode on Mon Aug 13 2018 at 07:47.

bmesuere commented 5 years ago

This is getting way too complicated and introduces more difficulties and complexities then it solves. The git part of Dodona is already an issue for some users and this only adds to this. The initial proposal is perfectly acceptable with only a very minor problem (copy + remove) where potentially old submissions are added to the new exercise, which could be fixed manually.

If (and only if) this would be a common issue, we could also try to fuzzy match the exercise name instead of only the path. (A side effect of this is that we could create a new exercise where it was moved. But the easy solution for this is to allow exercise merging using the UI).

To improve on the transparency of what happens on each webhook trigger (and something to make debugging easier for us), is a small log of what actually happened when processing. (timestamp, exercises created/modified/deleted/invalidated)

Some actual questions are:

Should we use a different identifier, or is the exercise id good enough?

I think we can simply use the exercise id.
Do we need to support different id's for different instances?

Technically, I think a single one for naos and for dodona is enough. The only problem is that we can't guarantee uniqueness over the two instances. This might not be a problem if we don't allow naos to write to the repository (as is the case now)
Do we use a separate file of the existing one (also the case for the access rights)?

I was originally in favor for a separate hidden config file, but adding it to the current one in an internal hash might be simpler

Original comment by @bmesuere on Mon Aug 13 2018 at 08:23.

chvp commented 5 years ago

I would argue that this new proposal adds less complexity (for the user) to the git part than the initial proposal. Note that this proposal also doesn't solve the copy + remove problem. The new proposal handles other ambiguous cases, like the user deleting the token from an existing exercise, generating a new token themselves in a new (or existing exercise) or using the same token in multiple exercises. If the user performs an ambiguous action we can send an email where each case that we don't accept and the actions that they need to take, depending on what they wanted to do, can be explained very clearly, instead of trying to take a best guess at what they meant to do.

I agree that keeping a log of what happened on each trigger of the webhook is a good way to increase the transparency of what happened.

Original comment by @charvp on Mon Aug 13 2018 at 09:21.

dodona-edu / dodona

add support for renaming exercise directories in a repository #209