Open Didayolo opened 4 years ago
Something similar to this may have been implemented to support AutoDL (multiple tasks) in v2? There is a way to do weighted sum scoring across children tasks without the need for a master scoring program, the master phase defines this behavior. Zhen and Jimmy implemented this for AutoDL in Jan/Feb, I believe.
Something similar to this may have been implemented to support AutoDL (multiple tasks) in v2? There is a way to do weighted sum scoring across children tasks without the need for a master scoring program, the master phase defines this behavior. Zhen and Jimmy implemented this for AutoDL in Jan/Feb, I believe.
@Didayolo do you know something about this?
AutoDL version of CodaLab is the 1.8 version (I think the Github branch is named google-cloud
or something like this). It corresponds to this instance of CodaLab: https://autodl.lri.fr/
In this version, we have a master scoring program and then the phases are used like tasks in Codabench. Indeed we had average rank here.
But I don't think it is useful to look here; average rank is implemented in vanilla CodaLab (v1.6). It was called using the Avg
operation (which is not a clear naming because Avg
should be the simple score average), like in this example of competition.yaml
file for a CodaLab competition with average rank:
leaderboard:
columns:
auc_binary:
label: Classification (AUC ROC)
leaderboard: &id001
label: RESULTS
rank: 1
rank: 2
auc_binary_2:
label: Feature Selection (AUC ROC)
leaderboard: *id001
rank: 3
ave_score:
label: < Rank >
leaderboard: *id001
rank: 1
sort: asc
computed:
operation: Avg
fields: auc_binary, auc_binary_2
leaderboards:
RESULTS: *id001
The computation is there in the code:
In Codabench, here is the yaml syntax:
[...]
columns:
- title: Average Accuracy
key: avg_accuracy
index: 0
sorting: desc
computation: avg
computation_indexes:
- 1
- 2
- 3
- 4
[...]
But this time, Avg
is for average; average rank is not implemented. We also can also edit this using the editor.
Last point, implementation of ranking functions can be found in this library:
rk.score
is the simple average, which calls np.mean
. rk.borda
is the average rank. The naming is this way because it is the names of these functions in the field of Social Choice Theory.
Also, just for the record, here are schemas to understand the difference between "average" and "average rank", and why it changes how the computation column should be computed in Codabench:
https://docs.google.com/presentation/d/1_qg2AcBUeBtIlYGq754wMHMkAn3mzIT_JjlLAepjWZI/edit?usp=sharing
leaderboards:
- index: 0
meta-scoring:
computation: average rank
columns :
- score_1
- score_2
Example with columns containing a string (useful when new columns could be added later):
leaderboards:
- index: 0
meta-scoring:
computation: average rank
column_ends_with : _score
we can have two keys column_ends_with
and column_starts_with
Later on add more meta-scoring programs
I think we should keep the interface we already have, which is mentioned above:
columns:
- title: Average Accuracy
key: avg_accuracy
index: 0
sorting: desc
computation: avg
computation_indexes:
- 1
- 2
- 3
- 4
and the editor:
Here you specify precisely which columns are used in the computation. Just, instead of Avg
, we'd like to have more possibility.
Maybe having a meta-scoring program, as proposed at first by this issue, is too much, and we just want to allow ranking functions of the Ranky package.
There is also the question of the v1 unpacker (#368)
It is a pity average ranking is not available in Codabench. This is a very useful feature in Codalab. It would be great if you consider including it in the future. Thanks
Competition organizers can define their own scoring and ingestion programs in a flexible way with Python code.
But, in Codalab v15, the computation of the final ranking when there are several sub-scores is constrained between few possibilities (average rank, maximum...).
I think it is crucial to be able to use any function to compute this final ranking, so organizers should be able to provide a kind of "meta scoring program".
The same idea was mentioned here:
445
Also, a similar feature we'd like:
444
Another related issue:
447
And the following post-it / discussion issue:
443