codalab / codabench

Codabench is a flexible, easy-to-use and reproducible benchmarking platform. Check our paper at Patterns Cell Press https://hubs.li/Q01fwRWB0
Apache License 2.0
77 stars 28 forks source link

Edit Task and Task Validation #1078

Open ihsaan-ullah opened 1 year ago

ihsaan-ullah commented 1 year ago

A Task in Codabench is considered atomic. A discussion is needed for the following:

  1. What is task validation and how it is done
  2. Should a task be updated? When and how

Some discussion below:

Task Validation

A task is considered validated when one of the following happens:

  1. Task runs its solution successfully (to be verified if it works)
  2. Task runs a submitted submission successfully (it does not work)

After either of the above is done i.e. Task is validated, this task cannot be updated anymore. If event after task needs to be updated, this should be discussed and planned properly. Some suggestion below:

Task Updation

A task can be updated only if it is not validated. How can a task be modified:

More:

Requests:

Several people requested the discussed features. They should be invited to the discussion:

Related issues and PRs:

Didayolo commented 1 year ago

Some remarks:

Task edition/updation

The goal of this feature was primarly the convenience. Indeed, most of the organizers that tried Codabench reported this problem: if you just want to update one program or one dataset, you need to create a task from scratch, selecting all programs and data, and then assign it to all benchmarks / phases needed. This process takes time and can be the source of many mistakes.

So, I do think we need to be able to edit tasks. Instead of being able to edit only validated tasks, we could say that editing a validated task un-validate the task.

Task validation

I think it is a cool idea to validate the tasks, typically, when a submission was successful.

However, running the solution to validate the task may be problematic. Indeed, to run this validation, you'll need some parameters that are external to the task itself: the queue, the docker image, the time limit, etc. If you need to specify all this to run the validation, then it is complex, and does not guarantee that the task will work in any benchmark configuration anyway.

So, I think tasks should be validated by a submission on a specific benchmark (with its own setup of queue, docker, etc.). OR we could move these settings from the benchmark models to the task models. This could make sense too, and offer interesting possibilities such as different docker images for each task, etc.

ihsaan-ullah commented 1 year ago

We need to check the purpose of solutions it looks like the only purpose of solution is to validate a task. If that is the case and it is decided that task validation should be done with submissions, then keeping solutions should have another purpose

Solutions, if considered a mini starting kits, are still kind of useless because now users can add starting kits

Didayolo commented 1 year ago

Notion of locking / unlocking task?

The idea: If a task is public, or used by benchmarks (or a submissions?), then it would not be editable.

Didayolo commented 1 year ago

From Ihsan:

If you remove a task from a competition and it is not used in another competition, you can delete the datasets used in this task. Is this suppose to happen?

After deleting all the datasets/programs. you have an empty task

263705985-41042130-bc72-4413-a545-a5176e335936
Didayolo commented 1 year ago

In my opinion, we need to keep the edit task feature as it is really convenient in many scenarios. If there is a drastic change in the programs, it is the responsibility of the organizers to re-run the submissions.

A compromise is that we could add warning to submissions after an edition of the task, saying something like Warning: this submission was computed on an older setting when the cursor is on it:

Capture d’écran 2023-10-10 à 17 21 59
ihsaan-ullah commented 1 month ago

The way task validation status is returned is weird.

Task serializer is always returning False because task has no filed with the name validated.

https://github.com/codalab/codabench/blob/de55ef3fc3642194d194fb79ab61aad6a3e25d6d/src/apps/api/serializers/tasks.py#L126

Task model has a filed named _validated which is computed in a weird way AND from the comment in the code it looks like it is not complete. Task model: https://github.com/codalab/codabench/blob/de55ef3fc3642194d194fb79ab61aad6a3e25d6d/src/apps/tasks/models.py#L30

For now task validation tests are always passing because the tests and this validation is kind of hard coded. Both should be fixed