Thanks for your contribution to solving the ranking problem in NAS. I am interested in the overall training pipeline of NAR. It is interesting to relax the performance prediction into a quality classification problem.
I am confused about how to build a NAR and can't figure out the underlined procedures of applying the trained NAR to the sampling-based search strategies. I summarized the questions as follows.
Does this work focus on improving the NAS ranking to help the sampling-based search methods find the optimal architectures? As mentioned in the paper, NAR helps find the promising architecture in NAS-Bench-101/201.
As "Supervised architecture feature extraction" mentioned in Section 3.1 of the paper, does the training of NAR require the ground-truth accuracy of various architectures? It seems to have to train many stand-alone models if studying a novel search space except for NAS-Bench search space.
Does it remove subnets that don't meet the constraint in every search iteration, as mentioned in Section 3.3, and take the rest to the trained ranker? From the results, it is hard to figure out how to search for the optimal network given the specific constraints (e.g., FLOPs, Parameters, Latency).
Thanks for your work, providing a thorough understanding of the hierarchical property of the search space.
Hi, thanks for your attention to this work. Here are my answers, hope it will help.
As mentioned in the paper, the NAR is designed to rank for cell-based architectures. Specifically, architectures are stacked by cells and each cell can be encoded by a Directed Acyclic Graph (DAG). So, in this way, whether it is a sampling-based (NAS-Bench-Series) or gradient-based (like DARTS), it can be ranked by NAR as long as it can be encoded with DAG. But you may try different encoding formats for cells and adjust the input codes of the NAR.
For any predictor training, the training costs are not avoidable. However, compared to the previous NAS which trains each candidate model from the scratch, the predictor-based NAS only needs to train a small number of models. One more thing, you may try to transfer the NAR which is trained on NAS101 to NAS201, and then finetune with a small number of models. This idea can further study the generability of the predictor-based NAS.
Hi,
Thanks for your contribution to solving the ranking problem in NAS. I am interested in the overall training pipeline of NAR. It is interesting to relax the performance prediction into a quality classification problem.
I am confused about how to build a NAR and can't figure out the underlined procedures of applying the trained NAR to the sampling-based search strategies. I summarized the questions as follows.
Thanks for your work, providing a thorough understanding of the hierarchical property of the search space.