facebookresearch / AttentiveNAS

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"
Other
103 stars 21 forks source link

Accuracy Predictor #8

Closed minghaoBD closed 2 years ago

minghaoBD commented 2 years ago

Hi, thanks for the great work! I have a question about the usage of the accuracy predictor.

Specifically, a predictor is used to get the acc of sub-networks and rank them during training, as described in your paper. But in the code, I didn't find where the predictor is used, like here (https://github.com/facebookresearch/AttentiveNAS/blob/88ad92f82dc343a0e7d681f1fb9a8deeb45be928/train_attentive_nas.py#L291), the criterion(model(input)) is used the get the predicted acc instead.

I am a little confused about this part, is there any important code I missed or any statement I misunderstood? Looking forward to your reply : )

dilinwang820 commented 2 years ago

Hi @minghaoBD, one of the key ablation dimension of AttentiveNAS is that "which subnet performs better without training to convergence?" To this end, we tried the following two approaches: 1) pre-train a acc predictor, 2) simply using the mini-batch loss on-the-fly

Empirically, we found 2) actually leads to better results, that's the current implementation.

Additionally, there're a couple of follow-up works on improving supernet training, for example - 1) AlphaNet: Improved Training of Supernet with Alpha-Divergence, 2) or NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training

Either solution produces much better performance while using a more simplified algorithm. I would recommend you to start with these new advancements instead.

minghaoBD commented 2 years ago

Thanks for sharing your insights and advice! I will try out the follow-up works.