LA-MCTS is a new MCTS based derivative-free meta-solver. It learns to partition the search space, so that solvers such as Bayesian optimization or evolutionary algorithms can focus on a smaller region to find better solutions with fewer samples.
Please 🌟star🌟 the repo if you like our work, thank you.
Linnan Wang (First Author), Yuandong Tian (Principal Investigator), Yiyang Zhao, Saining Xie, Teng Li and Rodrigo Fonesca.
This release contains our implementation of LA-MCTS and its application to Neural Architecture Search (LaNAS), but it can also be applied to large-scale hyper-parameter optimization, reinforcement learning, scheduling, optimizing computer systems, and many others.
Evaluation on NASBench-101 : Evaluating LaNAS on NASBench-101 on your laptop without training models.
Our Searched Models, LaNet: SoTA results: • 99.03% on CIFAR-10 • 77.7% @ 240MFLOPS on ImageNet.
One/Few-shot LaNAS: Using a supernet to evaluate the model, obtaining results within a few GPU days.
Distributed LaNAS: Distributed framework for LaNAS, usable with hundreds of GPUs.
Training heuristics used: We list all tricks used in ImageNet training to reach SoTA.
Performance with baselines: 1 minute evaluations of LA-MCTS v.s. Bayesian Optimization and Evolutionary Search. In the NeurIPS-2020 black box optimization challenge, the concept of LA-MCTS is used by 3rd (JetBrains) and 8th (KAIST) place teams. Check out the leaderboard here.
Mujoco Experiments: LA-MCTS on Mujoco environment.
Inspired by AlphaGo, we build the very first NAS search algorithm based on Monte Carlo Tree Search (MCTS) in 2017, namely AlphaX. The action space is fixed (layer-by-layer construction) and MCTS is used to steer towards promising search regions. We showed the Convolutional Neural Network designed by AlphaX improve the downstream applications such as detection, style transfer, image captioning, and many others.
Neural Architecture Search using Deep Neural Networks and Monte Carlo Tree Search AAAI-2020, [code] Linnan Wang (Brown), Yiyang Zhao(WPI), Yuu Jinnai(Brown), Yuandong Tian(FAIR), Rodrigo Fonseca(Brown)
On AlphaX, we find that different action space used in MCTS significantly affects the search efficiency, which motivates the idea of learning action space for MCTS on the fly during training. This leads to LaNAS. LaNAS uses a linear classifier at each decision node of MCTS to learn good versus bad actions, and evaluates each leaf node, which now represents a subregion of the search space rather than a single architecture, by a uniform random sampling one architecture and evalute. The first version of LaNAS implemented a distributed system to perform NAS by training every such samples from scratch using 500 GPUs. The second version of LaNAS, called one-shot LaNAS, uses a single one-shot subnetwork to evaluate the quality of samples, trading evaluation efficiency with accuracy. One-shot LaNAS finds a reasonable solution in a few GPU days.
Sample-Efficient Neural Architecture Search by Learning Action Space for Monte Carlo Tree Search TPAMI 2021 Linnan Wang (Brown), Saining Xie (FAIR), Teng Li(FAIR), Rodrigo Fonesca (Brown), Yuandong Tian (FAIR)
Since LaNAS works very well on NAS datasets, e.g. NASBench-101, and the core of the algorithm can be easily generalized to other problems, we extend it to be a generic solver for black-box function optimization. LA-MCTS further improves by using a nonlinear classifier at each decision node in MCTS and use a surrogate (e.g., a function approximator) to evaluate each sample in the leaf node. The surrogate can come from any existing Black-box optimizer (e.g., Bayesian Optimization). The details of LA-MCTS can be found in the following paper.
Learning Search Space Partition for Black-box Optimization using Monte Carlo Tree Search NeurIPS 2020 Linnan Wang (Brown University), Rodrigo Fonesca (Brown University), Yuandong Tian (Facebook AI Research)
To overcome issues of one-shot NAS, we propose few-shot NAS that uses multiple supernets, each covering different regions of the search space specified by the intermediate of the search tree. Extensive experiments show that few-shot NAS significantly improves upon one-shot methods. See the paper below for details.
Few-shot Neural Architecture Search [code] Yiyang Zhao (WPI), Linnan Wang (Brown), Yuandong Tian (FAIR), Rodrigo Fonseca (Brown), Tian Guo (WPI)
LA-MCTS is under CC-BY-NC 4.0 license.