Top-5%/top-64 computation

egracheva commented 2 years ago

Hello,

Thanks for a great paper.

When you compute top-5%/top-64 score (Tables 4, 11), how many architectures are there in total? Is it 3000 architectures (only warmup) or the size of the entire dataset?

Cheers, Ekaterina

vaenyr commented 2 years ago

From the top of my head I would say all models from the search space were included. This is also what the notebook seems to suggest, although I wasn't the one to run these experiments. Perhaps @mohsaied can verify.

mohsaied commented 2 years ago

Correct. They are the top-64 models in the entire search space. The idea is to quantify the degree by which zero-cost warmup improves the sampled architectures. If you took 64 random models then the number of top-5 models would simply be 5% of 64 = 3 models. However, when we use a zero-cost metric like synflow, and take the top 64 models in the search space, we increase that number significantly as shown in the tables.

So this comparison shows the best case scenario of zero-cost warmup. It would be interesting to also try it out with smaller warmup sizes as you suggested and that should be somewhat straightforward to do. If you end up doing this experiment, we'd love a pull request :)

egracheva commented 2 years ago

Thanks for your replies!

I was confused by the fact that these Tables are given in the "Warmup" section. Actually, I am still inclined to believe that the numbers are given for a random 3000 warmup.

Some time ago I plotted synflow metric vs accuracy, and the numbers in the tables did not seem to fit the shape of the cloud. Now I have double-checked and recomputed the value for the whole set (using the provided nasbench101_correlations notebook). My result with synflow for the top-5%/top64 is 4 for the whole search space of NAS-Bench-101 (compared to 12 given in the paper).

This is confirmed by the plots below: Screenshot 2022-09-21 at 9 56 37

Zoom: Screenshot 2022-09-21 at 9 56 21

My final aim is to compare my zero-cost metric to your results, and even though I have higher overall and top-10% correlations, my top-5%/top-64 for NAS-Bench-101 is also very low (I'd guess this is the nature of the benchmark, probably, not the optimal set of hyperparameters during the training, or not enough epochs.)

I think I can do multiple warmup test as you suggested some time later (soon).

SamsungLabs / zero-cost-nas

Top-5%/top-64 computation #11