Currently, order is by the median across target noise levels.
Since there are 4 of them (0.0, 0.001, 0.01, 0.1), essentially the ordering was based on the avg between 0.001 and 0.01.
I think it makes more sense to order by the mean across the 4 noise levels, hence the PR.
Perhaps median() should be called across the repeated runs, to avoid "long bars" on Test R2 which I guess are due to outliers?
Currently, order is by the median across target noise levels. Since there are 4 of them (0.0, 0.001, 0.01, 0.1), essentially the ordering was based on the avg between 0.001 and 0.01. I think it makes more sense to order by the mean across the 4 noise levels, hence the PR.
Perhaps median() should be called across the repeated runs, to avoid "long bars" on Test R2 which I guess are due to outliers?