dyweb / papers-notebook

:page_facing_up: :cn: :page_with_curl: 论文阅读笔记(分布式系统、虚拟化、机器学习)Papers Notebook (Distributed System, Virtualization, Machine Learning)
https://github.com/dyweb/papers-notebook/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+-label%3ATODO-%E6%9C%AA%E8%AF%BB
Apache License 2.0
2.14k stars 250 forks source link

Automated machine learning on Platform of Artificial Intelligence #99

Open gaocegege opened 6 years ago

gaocegege commented 6 years ago

https://102.alibaba.com/fund/proposalTopicDetail.htm?id=1116

Alibaba Fund 主题

gaocegege commented 6 years ago
  1. Hyper-parameter Tuning

In recent years, machine learning models have exploded in complexity and expressibility at the cost of staggering computational costs and a growing number of tuning parameters that are difficult to set by standard optimization techniques. These hyperparameters are inputs to machine learning algorithms that govern how the algorithm’s performance generalizes to new, unseen data; examples of hyperparameters include those that impact model architecture, amount of regularization, and learning rates. The quality of a predictive model critically depends on its hyperparameter configuration, but it is poorly understood how these hyperparameters interact with each other to affect the quality of the resulting model. Consequently, practitioners often default to brute-force methods like random search and grid search.

Hyper-parameter optimization includes hyperparameter configuration selection and evaluation. In the former, Bayesian optimization methods are dominated, however, the sequential property and dimension disaster of Bayesian optimization make it hard to use in big data scenarios. And in the hyperparameter selection evaluation, there is no good general early stopping algorithm.

  1. Neural Architecture Search

Discovering high-performance neural network architectures required years of extensive research by human experts through trial and error. The combinatorial explosion in the design space makes handcrafted architectures not only expensive to obtain, but also likely to be suboptimal in performance. Recently, there has been a surge of interest in using algorithms to automate the manual process of architecture design. Their goal can be described as finding the optimal architecture in a given search space such that the validation accuracy is maximized on the given task. Representative architecture search algorithms can be categorized as evolution algorithms, and reinforcement learning.

When using evolutionary algorithms (EA), each neural network structure is encoded as a string, and random mutations and recombinations of the strings are performed during the search process. When using reinforcement learning (RL), the agent performs a sequence of actions, which specifies the structure of the model; this model is then trained and its validation performance is returned as the reward function, which is used to update the RNN controller. Although both EA and RL methods have been able to learn network structures that outperform manually designed architectures, they require significant computational resources. We need a highly effective algorithm to accelerate the architecture search, so that we can use it on daily machine learning jobs on PAI.

  1. Transfer Learning

Many machine learning methods work well only under a common assumption: the training and test data are drawn from the same feature space and the same distribution. When the distribution changes, most statistical models need to be rebuilt from scratch using newly collected training data. In many real world applications, it is expensive or impossible to re-collect the needed training data and rebuild the models. It would be nice to reduce the need and effort to re-collect the training data. In such cases, knowledge transfer or transfer learning between task domains would be desirable. Transfer learning is classified to three different settings: inductive transfer learning, transductive transfer learning and unsupervised transfer learning. So far transfer learning techniques have been mainly applied to small scale applications with a limited variety, a lot of explorations need to be done on large scale data applications.

gaocegege commented 6 years ago

第一个

gaocegege commented 6 years ago

第二个