Closed AwesomeLemon closed 3 years ago
Hi, thank you for your interest.
You're right, we only use evolutionary search to provide a population level understanding of the performance of sub-networks trained via different KD strategies. The models we found during evolutionary search are not used when comparing with prior art NAS works; otherwise it's not a fair comparison, as we used the validation set for searching.
So in this work - for simplicity, we evaluate the Pareto models found in AttentiveNAS and treat them as our Pareto models. Better results are expected if you use better strategy to estimate the Pareto front for each supernet. Thanks.
I understand now, thanks. Out of curiosity: why didn't you use evolutionary search the same way as in AttentiveNAS? (i.e. separate 200K samples to run the search on & thanks to the search get better results)
In AttentiveNAS, we found supernet generally produce very accurate rank correlation between sub-networks; hence we believe the Pareto models found in AttentiveNAS would also likely perform well in our setting.
I see, thanks again!
Hello,
I like your work, but I'm a bit confused about how final models a0-a6 were selected. In the paper, in section 4.2, subsection "Evaluation" you describe an evolutionary search procedure. However, in the subsection "Improvements on SOTA" you write that you choose a0-a6 architectures to be the same as in the AttentiveNAS paper. Do I understand correctly that the results of the evolutionary search were not used when selecting the final models?
Thanks in advance!