Identify structure of final leaderboard

Most importantly, we need to identify how to report the significance of AUROC and AUPR between teams.

In past challenges, the performance of a given team is usually compared to the performance of the top best performer. The information provided is thus whether a give team is significantly outperformed by the top best performer. All teams are then ranked based on their primary metrics without a sense of whether team ranked 10th and 11th, for example, are significantly different.
In the Leaderboard Phase of the EHR DREAM Challenge, I came up with an alternative solution that compares each pair of consecutive team. We obtained an iterative algorithm that ranks teams first based on the value of the primary metrics and the significance between each pair, and in case of a tie for the primary metrics, apply the same significance test on the values for the secondary metrics. This leads to a more meaningful ranking of the teams where a team 10th may have a lower AUROC than team 11th but computed as non-significant, and where then team 10th has a significantly larger AUPR than team 11th. The issue with this approach is in the way we are reporting the Bayes Factor that is confusing.

I had a discussion with Mike M. and Elias on these approaches. Both have used in the past Method 1 and thus are in favor of using it. Discussed with Elias the option to release the full matrix of significance between each pair of teams.

The ideal way of reporting results still need further thinking and discussion.

data2health / DREAM-Challenge

Identify structure of final leaderboard #63