Neural Architecture Search: A Survey

dyweb / papers-notebook

:page_facing_up: :cn: :page_with_curl: 论文阅读笔记（分布式系统、虚拟化、机器学习）Papers Notebook (Distributed System, Virtualization, Machine Learning)

https://github.com/dyweb/papers-notebook/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+-label%3ATODO-%E6%9C%AA%E8%AF%BB

Apache License 2.0

2.12k stars 244 forks source link

Neural Architecture Search: A Survey #100

Open gaocegege opened 5 years ago

gaocegege commented 5 years ago

https://arxiv.org/abs/1808.05377

at15 commented 5 years ago

.... 你不是在度假么 ? .... 读 paper 没有假期? .w.

gaocegege commented 5 years ago

=。= 看到了顺手一发，回来了

geekplux commented 5 years ago

这就是大佬吧。。

gaocegege commented 5 years ago

PS：这篇文章最好的地方是在介绍的时候列出了诸多 reference，因此建议想针对某一个问题做调研时回看原文，阅读其中所有的参考文献

这篇文章从三个方面介绍了目前 NAS 领域的相关研究工作，分别是搜索空间，搜索策略和 Performance Estimation Strategy. （不知道应该如何翻译），三者的关系如图所示：

screenshot from 2018-08-27 14-10-36

搜索空间指的是神经网络结构组成的一个集合，在这一集合中，每次会根据具体的搜索策略寻找到某一个结构，随后利用 Performance Estimation Strategy 去 evaluate 这一结构。

gaocegege commented 5 years ago

首先从搜索空间这一个方面来讲，最简单的搜索空间是如下图左边所示的，chain-structured neural networks。在这种方式下，搜索空间是可以参数化的：

(i) the (maximum) number of layers n (possibly unbounded); (ii) the type of operation every layer can execute, e.g., pooling, convolution, or more advanced layer type (iii) hyperparameters associated with the operation, e.g., number of filters, kernel size and strides for a convolutional layer

screenshot from 2018-08-27 14-14-24

但是这种搜索空间的选择方式，只能找到最简单的网络结构，因此这时学术界也有面向多分支的复杂网络的搜索空间表示方式。在这种方式中，第 i 层的输入是之前 i - 1 层的输出的函数。这其实是对之前的表示方式的一种泛化。之前链式结构也可以用这一方式表示为

screenshot from 2018-08-27 14-28-41

但是，这样的表示方式并没有人为的知识，比如我们已经很确定某个神经网络的子结构可以很好地被用来构建网络结构。这时候就有 cell-based 的思路被提出来。先设计好子结构，然后基于子结构再去搜索。

gaocegege commented 5 years ago

神经网络结构搜索的策略有 random search, Bayesian optimization, evolutionary method 和 RL 以及 gradient-based 等等。进化算法在 08,09 年就被用来寻找神经网络结构和参数取值，但 NAS 真正取得一定的成绩，是从 13 年 Bayesian optimization 的应用开始的。

最近一年（18 年）应用更多的是 RL。

the generation of a neural architecture can be considered to be the agent’s action, with the action space identical to the search space. The agent’s reward is based on an estimate of the performance of the trained architecture on unseen data

gradient-based 我觉得是未来的方向，因为这样可以大大降低对算力的要求。

gaocegege commented 5 years ago

评估的策略这边，最简单的方式就是用传统 evaluate model 的方式，划分训练集和验证集。但是这样的验证方式是非常浪费硬件资源的。为了降低资源使用的问题，performance can be estimated based on lower fidelities of the actual performance after full training (also denoted as proxy metrics)。这其中包括缩短训练的时间，在子集上进行训练，对 CV 模型降低图片的分辨率等等。

还有就是建立在 learning curve extrapolation 上的方法，根据原始的学习曲线，进行推测，然后终止掉推测小效果不好的。

另外一种就是利用其他结构的权重来设置新的结构的权重。

最后一种是一个比较有名的方式：One-Shot Architecture Search，在文中也花了很长时间在介绍这一方式。这一种方式非常地巧妙，它把所有的搜索结构当成一个 supergraph（one-shot model）的子图，在模型之间，有公共边的结构是共享权重。但是这种方式会严重低估模型真实的 perf（暂时没懂为什么）这种方法通常与 cell based 的搜索空间表示方式一起使用。

gaocegege commented 5 years ago

NAS 目前会常常被应用于图像分类领域，这一类领域往往可以借助人为的经验设定相对合理的搜索空间，但是这也导致了搜索到的模型与经典的模型相比并没有本质的不同，没有很大的提高。因此目前有很多探索是将 NAS 应用于 Language model 等等其他领域的问题上。

NAS 的另一个 future work 是针对多任务或者多目标问题的 NAS 方法。另外就是如何定义一个更加 general 和灵活的搜索空间。

xieydd commented 5 years ago

@gaocegege 为什么gradient-based 会大大降低对于算力的影响呢？

gaocegege commented 5 years ago

@xieydd 因为 gradient-based 首先需要将离散的搜索空间连续化，然后在离散的搜索空间里运行 gradient-based 算法。连续空间里的最优化问题算力要求低很多（收敛速度会快）

gaocegege commented 5 years ago

@xieydd 附送一篇论文 https://github.com/gaocegege/papers-notebook/issues/95

xieydd commented 5 years ago

@gaocegege Extremely Thanks.