AIris - CjangCjengh - Question about the combination of scores

CjangCjengh commented 8 months ago

Hello, I noticed a graph on the competition's official website regarding the combination of scores. It states that AUROC accounts for 70% of the final score. If it does not violate any confidentiality principles, could you please explain the specific calculation method? For example, if a team has an AUROC of 0.6, does that mean its contribution to the final score is 0.6 * 0.7 = 0.42? Or are there more complex calculation mechanisms involving weighted algorithms and ranking scores?

AISG-Wesley commented 8 months ago

The specific breakdown is as indicated in the table. The top 5 finalists will be selected based on their AUROC scores on the leaderboard. This AUROC score will only constitute 70% of the final score used to determine the winning teams. The remaining 30% will be awarded by the Technical Review Committee, following their assessment of the finalists’ submissions and presentations. Please refer to section 9 of the Challenge Terms and Conditions for more information.

petergro-hub commented 8 months ago

Oh so is the AUROC score and not the ranking? The way I had understood it was that it would be ranking*0.7, where the ranking is like a grade between 1 and 5 & the other contributions would also be graded on a similar scale.

mio7690 commented 8 months ago

I share the same concerns. If the AUROC score is directly used as part of the scoring, it seems that almost anyone who submits could achieve 50% of the points. However, the gap between the first place and the others might be minimal, making it so that the team in first place, despite potentially investing significantly more effort, create a very small score over the others.

AISG-Wesley commented 8 months ago

Thank you for your feedback. We will not be changing our current stance on the evaluation metrics for several reasons:

Incentivizing the development of systems that are robust, translatable, and innovative is a key component in this open-ended, zero-shot competition. As such, the methodologies of the technical solutions are as important in the selection of the winning team as achieving a high score on the leaderboard.
This challenge seeks to reward innovative and effective solutions over brute force methods. The higher weightage of the TRC component was attributed to balance out potential resource asymmetry between teams, (i.e. teams with access to vast amounts of data may be able to achieve high AUROC scores with a system that is not optimised for the problem).
The five finalist teams will have ample opportunity to showcase their solutions through a technical report and in person at the upcoming ACM Web Conference. We are particularly interested in rewarding teams that can clearly demonstrate why their solution is exceptional and deserving of the top spot.

petergro-hub commented 8 months ago

I think that is definifely up to AISG to decide, my point was more on the question of weighting. What scales will the other 30% use? If auroc is 0 to nr.1 in the auroc leaderboard. Will the evaluation of the report be on 0 to 1, 0 to 5, 0 to 100?

AISG-Wesley commented 8 months ago

Hi petergro-hub, we will not be releasing detailed rubrics for each of the TRC components. If selected as a finalist, your technical report and presentation will be evaluated for Novelty & Innovativeness, Code Quality & Readability, and Reproducibility, and this will contribute to 30% of the final score.

AISG-Technology-Team / AISG-Online-Safety-Challenge-Submission-Guide

AIris - CjangCjengh - Question about the combination of scores #40