To determine whether differences in prediction accuracy between the different teams are robust, follow these steps:
Simulate the challenge by drawing 1,000 bootstrap samples from the test set.
In each resample, calculate the RMSE for each ranked submission.
Calculate the Bayes factors as the ratio of the number of iterations in which team k performed better than the team ranked next (k+1) to the number of iterations where the reverse was true.
Plot the bootstrap RMSEs as box/violin plots and display the Bayes factor for each team.
To determine whether differences in prediction accuracy between the different teams are robust, follow these steps: