AlbertiPot / nar

codes for Neural Architecture Ranker and detailed cell information datasets based on NAS-Bench series
BSD 3-Clause "New" or "Revised" License
12 stars 0 forks source link

How to build a Neural Architecture Ranker? #2

Closed mechanicalsea closed 2 years ago

mechanicalsea commented 2 years ago

Hi,

Thanks for your contribution to solving the ranking problem in NAS. I am interested in the overall training pipeline of NAR. It is interesting to relax the performance prediction into a quality classification problem.

I am confused about how to build a NAR and can't figure out the underlined procedures of applying the trained NAR to the sampling-based search strategies. I summarized the questions as follows.

  1. Does this work focus on improving the NAS ranking to help the sampling-based search methods find the optimal architectures? As mentioned in the paper, NAR helps find the promising architecture in NAS-Bench-101/201.
  2. As "Supervised architecture feature extraction" mentioned in Section 3.1 of the paper, does the training of NAR require the ground-truth accuracy of various architectures? It seems to have to train many stand-alone models if studying a novel search space except for NAS-Bench search space.
  3. Does it remove subnets that don't meet the constraint in every search iteration, as mentioned in Section 3.3, and take the rest to the trained ranker? From the results, it is hard to figure out how to search for the optimal network given the specific constraints (e.g., FLOPs, Parameters, Latency).

Thanks for your work, providing a thorough understanding of the hierarchical property of the search space.

AlbertiPot commented 2 years ago

Hi, thanks for your attention to this work. Here are my answers, hope it will help.

  1. As mentioned in the paper, the NAR is designed to rank for cell-based architectures. Specifically, architectures are stacked by cells and each cell can be encoded by a Directed Acyclic Graph (DAG). So, in this way, whether it is a sampling-based (NAS-Bench-Series) or gradient-based (like DARTS), it can be ranked by NAR as long as it can be encoded with DAG. But you may try different encoding formats for cells and adjust the input codes of the NAR.
  2. For any predictor training, the training costs are not avoidable. However, compared to the previous NAS which trains each candidate model from the scratch, the predictor-based NAS only needs to train a small number of models. One more thing, you may try to transfer the NAR which is trained on NAS101 to NAS201, and then finetune with a small number of models. This idea can further study the generability of the predictor-based NAS.
  3. Yes, it rejects the subnets which do not meet the constraint of FLOPs and Parameters, see below: https://github.com/AlbertiPot/nar/blob/eb081f0e1ee16c2b1eb5e6e2afd41254cd7dce28/sampler/arch_sampler.py#L138 first, it samples a batch of candidate models which satisfy the FLOPs and parameters constraints, then the NAR ranks them with 5 tires and scores them. Finally, it selects the one with the highest score in tier 1.
mechanicalsea commented 2 years ago

Thanks for your careful responses. I earned a lot.