facebookresearch / AlphaNet

AlphaNet Improved Training of Supernet with Alpha-Divergence
Other
97 stars 13 forks source link

How were the final architectures selected? #2

Closed AwesomeLemon closed 3 years ago

AwesomeLemon commented 3 years ago

Hello,

I like your work, but I'm a bit confused about how final models a0-a6 were selected. In the paper, in section 4.2, subsection "Evaluation" you describe an evolutionary search procedure. However, in the subsection "Improvements on SOTA" you write that you choose a0-a6 architectures to be the same as in the AttentiveNAS paper. Do I understand correctly that the results of the evolutionary search were not used when selecting the final models?

Thanks in advance!

dilinwang820 commented 3 years ago

Hi, thank you for your interest.

You're right, we only use evolutionary search to provide a population level understanding of the performance of sub-networks trained via different KD strategies. The models we found during evolutionary search are not used when comparing with prior art NAS works; otherwise it's not a fair comparison, as we used the validation set for searching.

So in this work - for simplicity, we evaluate the Pareto models found in AttentiveNAS and treat them as our Pareto models. Better results are expected if you use better strategy to estimate the Pareto front for each supernet. Thanks.

AwesomeLemon commented 3 years ago

I understand now, thanks. Out of curiosity: why didn't you use evolutionary search the same way as in AttentiveNAS? (i.e. separate 200K samples to run the search on & thanks to the search get better results)

dilinwang820 commented 3 years ago

In AttentiveNAS, we found supernet generally produce very accurate rank correlation between sub-networks; hence we believe the Pareto models found in AttentiveNAS would also likely perform well in our setting.

AwesomeLemon commented 3 years ago

I see, thanks again!