Didayolo commented 4 years ago

Competition organizers can define their own scoring and ingestion programs in a flexible way with Python code.

But, in Codalab v15, the computation of the final ranking when there are several sub-scores is constrained between few possibilities (average rank, maximum...).

I think it is crucial to be able to use any function to compute this final ranking, so organizers should be able to provide a kind of "meta scoring program".

The same idea was mentioned here:

445

Much more flexible way to handle computation columns. This should be a fine solution for a much more advanced "computational column" type system. Organizers could do any kind of computation they'd like! This should be executed even if children have failed.

Also, a similar feature we'd like:

444

Another related issue:

447

Currently, there is one column(primary column) that is used to decide ranking on the leaderboard. We should add ways to compute leaderboard ranking across all columns on the leaderboard, and have a way to exclude to certain columns from the ranking calculation. Make computed columns.

Could automatically be generated by columns with the same key

Pro: simple and programmatic

Cons: Would have less control

Could require each column to have a unique key and then group the columns later to be calculated OR

Could instead select columns by key + task

Pro: More control in selection

Con: tasks more work by admins

And the following post-it / discussion issue:

443

ckcollab commented 4 years ago

Something similar to this may have been implemented to support AutoDL (multiple tasks) in v2? There is a way to do weighted sum scoring across children tasks without the need for a master scoring program, the master phase defines this behavior. Zhen and Jimmy implemented this for AutoDL in Jan/Feb, I believe.

ihsaan-ullah commented 1 year ago

Something similar to this may have been implemented to support AutoDL (multiple tasks) in v2? There is a way to do weighted sum scoring across children tasks without the need for a master scoring program, the master phase defines this behavior. Zhen and Jimmy implemented this for AutoDL in Jan/Feb, I believe.

@Didayolo do you know something about this?

Didayolo commented 1 year ago

AutoDL version of CodaLab is the 1.8 version (I think the Github branch is named google-cloud or something like this). It corresponds to this instance of CodaLab: https://autodl.lri.fr/

In this version, we have a master scoring program and then the phases are used like tasks in Codabench. Indeed we had average rank here.

But I don't think it is useful to look here; average rank is implemented in vanilla CodaLab (v1.6). It was called using the Avg operation (which is not a clear naming because Avg should be the simple score average), like in this example of competition.yaml file for a CodaLab competition with average rank:

leaderboard:
  columns:
    auc_binary:
      label: Classification (AUC ROC)
      leaderboard: &id001
        label: RESULTS
        rank: 1
      rank: 2
    auc_binary_2:
      label: Feature Selection (AUC ROC)
      leaderboard: *id001
      rank: 3
    ave_score:
      label: < Rank >
      leaderboard: *id001
      rank: 1
      sort: asc
      computed:
        operation: Avg
        fields: auc_binary, auc_binary_2
  leaderboards:
    RESULTS: *id001

The computation is there in the code:

https://github.com/codalab/codalab-competitions/blob/68036d9b06272ce23572d121c85b3c00ee49a21a/codalab/apps/web/models.py#L1251

In Codabench, here is the yaml syntax:

[...]
  columns:
      - title: Average Accuracy
        key: avg_accuracy
        index: 0
        sorting: desc
        computation: avg
        computation_indexes:
          - 1
          - 2
          - 3
          - 4
[...]

But this time, Avg is for average; average rank is not implemented. We also can also edit this using the editor.

Last point, implementation of ranking functions can be found in this library:

https://github.com/Didayolo/ranky

rk.score is the simple average, which calls np.mean. rk.borda is the average rank. The naming is this way because it is the names of these functions in the field of Social Choice Theory.

Didayolo commented 1 year ago

Also, just for the record, here are schemas to understand the difference between "average" and "average rank", and why it changes how the computation column should be computed in Codabench:

https://docs.google.com/presentation/d/1_qg2AcBUeBtIlYGq754wMHMkAn3mzIT_JjlLAepjWZI/edit?usp=sharing

ihsaan-ullah commented 1 year ago

Plan for this feature @Didayolo

Add option in yaml file to specify meta-scoring columns. Example with specified columns:

leaderboards:
- index: 0
meta-scoring:
  computation: average rank
  columns : 
    - score_1
    - score_2

Example with columns containing a string (useful when new columns could be added later):

leaderboards:
 - index: 0
   meta-scoring:
      computation: average rank
      column_ends_with : _score

we can have two keys column_ends_with and column_starts_with

store meta-scoring fileds
Apply meta-scoring when fetching a leaderboard

Later on add more meta-scoring programs

Didayolo commented 1 year ago

I think we should keep the interface we already have, which is mentioned above:

  columns:
      - title: Average Accuracy
        key: avg_accuracy
        index: 0
        sorting: desc
        computation: avg
        computation_indexes:
          - 1
          - 2
          - 3
          - 4

and the editor:

Here you specify precisely which columns are used in the computation. Just, instead of Avg, we'd like to have more possibility.

Maybe having a meta-scoring program, as proposed at first by this issue, is too much, and we just want to allow ranking functions of the Ranky package.

Didayolo commented 1 year ago

There is also the question of the v1 unpacker (#368)

juliojj commented 8 months ago

It is a pity average ranking is not available in Codabench. This is a very useful feature in Codalab. It would be great if you consider including it in the future. Thanks

codalab / codabench

[Feature] Meta scoring program: organizer provides the code for computation between leaderboard columns #406

445

444

447

443

Plan for this feature @Didayolo