Edit Task and Task Validation

ihsaan-ullah commented 1 year ago

A Task in Codabench is considered atomic. A discussion is needed for the following:

What is task validation and how it is done
Should a task be updated? When and how

Some discussion below:

Task Validation

A task is considered validated when one of the following happens:

Task runs its solution successfully (to be verified if it works)
Task runs a submitted submission successfully (it does not work)

After either of the above is done i.e. Task is validated, this task cannot be updated anymore. If event after task needs to be updated, this should be discussed and planned properly. Some suggestion below:

invalidate a task before it is updated
remove all the submissions which have used this task and then invalidated the task

Task Updation

A task can be updated only if it is not validated. How can a task be modified:

update the current (invalidated) task by changing title, description, program or dataset (In this case task id stays the same)
create a new task from current task and modify title, description, program or dataset. (In this case task id is newly created and old task stays the same)
Add functionality to upload a zip of task which will contain
- a yaml file with task title, description and/or dataset/program keys
- datasets/programs zips/directories

More:

Task updation is a critical feature and should be discussed first before any final action is taken
A big warning message should be displayed for the organizers when updating a task

Requests:

Several people requested the discussed features. They should be invited to the discussion:

Kent
Khuong
Benjamin Donnot
https://github.com/lbeziaud

Related issues and PRs:

742
1070
1071
1072

Didayolo commented 1 year ago

Some remarks:

Task edition/updation

The goal of this feature was primarly the convenience. Indeed, most of the organizers that tried Codabench reported this problem: if you just want to update one program or one dataset, you need to create a task from scratch, selecting all programs and data, and then assign it to all benchmarks / phases needed. This process takes time and can be the source of many mistakes.

So, I do think we need to be able to edit tasks. Instead of being able to edit only validated tasks, we could say that editing a validated task un-validate the task.

Task validation

I think it is a cool idea to validate the tasks, typically, when a submission was successful.

However, running the solution to validate the task may be problematic. Indeed, to run this validation, you'll need some parameters that are external to the task itself: the queue, the docker image, the time limit, etc. If you need to specify all this to run the validation, then it is complex, and does not guarantee that the task will work in any benchmark configuration anyway.

So, I think tasks should be validated by a submission on a specific benchmark (with its own setup of queue, docker, etc.). OR we could move these settings from the benchmark models to the task models. This could make sense too, and offer interesting possibilities such as different docker images for each task, etc.

ihsaan-ullah commented 1 year ago

We need to check the purpose of solutions it looks like the only purpose of solution is to validate a task. If that is the case and it is decided that task validation should be done with submissions, then keeping solutions should have another purpose

Solutions, if considered a mini starting kits, are still kind of useless because now users can add starting kits

Didayolo commented 1 year ago

Notion of locking / unlocking task?

The idea: If a task is public, or used by benchmarks (or a submissions?), then it would not be editable.

Didayolo commented 1 year ago

From Ihsan:

If you remove a task from a competition and it is not used in another competition, you can delete the datasets used in this task. Is this suppose to happen?

After deleting all the datasets/programs. you have an empty task

263705985-41042130-bc72-4413-a545-a5176e335936

Didayolo commented 1 year ago

In my opinion, we need to keep the edit task feature as it is really convenient in many scenarios. If there is a drastic change in the programs, it is the responsibility of the organizers to re-run the submissions.

A compromise is that we could add warning to submissions after an edition of the task, saying something like Warning: this submission was computed on an older setting when the cursor is on it:

ihsaan-ullah commented 1 month ago

The way task validation status is returned is weird.

Task serializer is always returning False because task has no filed with the name validated.

https://github.com/codalab/codabench/blob/de55ef3fc3642194d194fb79ab61aad6a3e25d6d/src/apps/api/serializers/tasks.py#L126

Task model has a filed named _validated which is computed in a weird way AND from the comment in the code it looks like it is not complete. Task model: https://github.com/codalab/codabench/blob/de55ef3fc3642194d194fb79ab61aad6a3e25d6d/src/apps/tasks/models.py#L30

For now task validation tests are always passing because the tests and this validation is kind of hard coded. Both should be fixed

codalab / codabench