Open bkj opened 5 years ago
it's the mean test accuracy, i.e 0.943175752957662
Ok thanks. So it sounds like it’s
log(best_mean_test_acc - arch_mean_test_acc)
then? Otherwise it would be possible to have -inf
regret?
I was also wondering — do you have a similar plot for validation accuracy that you could share?
yes exactly it's log(best_mean_test_acc - arch_mean_test_acc)
I attached Figure 7 just with the validation regret on the y-axis. comparison_time_all_mean.pdf comparison_time_all_mean_valid.pdf
Note that, we found some slightly better hyperparameters for SMAC and BOHB that's why they improved. For comparison I also added the original Fig 7 with the updated test regret.
I put code that attempts to reproduce the results of the random search here: https://gist.github.com/bkj/8ae8da3c84bbb0fa06d144a6e7da8570
The results don't look exactly the same as in the paper -- the best regret is around 5.5 * 1e-3
vs what looks like about 4.1 * 1e-3
in the paper. Any thoughts on where the differences might be coming from?
Roughly the procedure is:
1) sample sequence N random architectures
2) sample a validation accuracy per architecture
3) plot log10(best_mean_test_acc - arch_mean_test_acc)
for the architecture w/ the best validation accuracy seen so far
Plot of results:
Edit: Perhaps the issue is line 73 -- do you use the mean validation accuracy across the 3 runs for model selection, as opposed to a sample of a single run? Updated the plot above to show the difference.
The left plot in Fig 7 in the paper shows test regret -- can you explain how that's computed exactly?
I know it's
log10(y - y_best)
-- but what isy_best
exactly? Is that the best validation/test accuracy for a single model run / averaged across the 3 model runs?I think the four possibilities would be:
Thanks!