Optuna: A Next-generation Hyperparameter Optimization Framework

gaocegege commented 4 years ago

https://arxiv.org/pdf/1907.10902.pdf

https://github.com/pfnet/optuna/

SIGKDD'19 Applied Data Science Track Papers

gaocegege commented 4 years ago

自称的下一代超参数优化框架，特点主要是：

Define-by-run programming that allows the user to dynamically construct the search space,
Efficient sampling algorithm and pruning algorithm that allows some user-customization,
Easy-to-setup, versatile architecture that can be deployed for tasks of various types, ranging from light-weight experiments conducted via interactive interfaces to heavy-weight distributed computations.

值得一提的是，它的 Distributed 训练，是用 Python 的 multiprocess 实现的，更像是并发而非并行。哦多进程称为并行也没问题。但是它没法解决单机的资源瓶颈问题。

https://github.com/pfnet/optuna/blob/451fe257342c50a11940ec380f06f60f94324237/optuna/study.py#L416

    def _optimize_parallel(
            self,
            func,  # type: ObjectiveFuncType
            n_trials,  # type: Optional[int]
            timeout,  # type: Optional[float]
            n_jobs,  # type: int
            catch,  # type: Union[Tuple[()], Tuple[Type[Exception]]]
            callbacks  # type: Optional[List[Callable[[Study, structs.FrozenTrial], None]]]
    ):
        # type: (...) -> None

        self.start_datetime = datetime.datetime.now()

        if n_jobs == -1:
            n_jobs = multiprocessing.cpu_count()

        if n_trials is not None:
            # The number of threads needs not to be larger than trials.
            n_jobs = min(n_jobs, n_trials)

            if n_trials == 0:
                return  # When n_jobs is zero, ThreadPool fails.

        pool = multiprocessing.pool.ThreadPool(n_jobs)  # type: ignore

        # A queue is passed to each thread. When True is received, then the thread continues
        # the evaluation. When False is received, then it quits.
        def func_child_thread(que):
            # type: (Queue) -> None

            while que.get():
                self._run_trial_and_callbacks(func, catch, callbacks)
            self._storage.remove_session()

        que = multiprocessing.Queue(maxsize=n_jobs)  # type: ignore
        for _ in range(n_jobs):
            que.put(True)
        n_enqueued_trials = n_jobs
        imap_ite = pool.imap(func_child_thread, [que] * n_jobs, chunksize=1)

        while True:
            if timeout is not None:
                elapsed_timedelta = datetime.datetime.now() - self.start_datetime
                elapsed_seconds = elapsed_timedelta.total_seconds()
                if elapsed_seconds > timeout:
                    break

            if n_trials is not None:
                if n_enqueued_trials >= n_trials:
                    break

            try:
                que.put_nowait(True)
                n_enqueued_trials += 1
            except queue.Full:
                time.sleep(1)

        for _ in range(n_jobs):
            que.put(False)

        collections.deque(imap_ite, maxlen=0)  # Consume the iterator to wait for all threads.
        pool.terminate()
        que.close()
        que.join_thread()

gaocegege commented 4 years ago

Define-by-run

这个特性是说，其他的框架都是静态定义搜索空间，而它可以动态地定义。作者说 Optuna 之于其他的超参数搜索框架，就好像 PyTorch 之于 TensorFlow。说的还是挺有道理的，不过这样的场景是否真的很有用，见仁见智了。这样子的设计，是必然要对用户的代码有侵入性的，是永远不可能做到 0 侵入的。另外，它跟 HyperOpt 做了对比，HyperOpt 的这部分的 API 设计其实特别冗杂，可能是考虑了更多的 use case。跟其他的框架比一下，比如 NNI，skopt，可能不会这么夸张，尤其是 NNI

Screenshot from 2019-10-22 15-42-21

Screenshot from 2019-10-22 15-42-31

gaocegege commented 4 years ago

这一特性，有一个比较好的应用，就是可以探索不同的学习方法，这个在传统机器学习中，可能还是有用的。下面的例子中就是对 random forest 和 MLP 都进行了尝试。

Screenshot from 2019-10-22 15-50-37

gaocegege commented 4 years ago

其中 2.2 节没太看懂，看不懂在部署的时候有什么优势

gaocegege commented 4 years ago

2.3 节痛快地批判了一番 Define-and-run 的风格，a.k.a TensorFlow 1.x 静态图。声称 PyTorch 为代表的动态图正在逐渐替代静态图。这个还是有一定道理的，传统的超参数优化框架，是先定义搜索空间，然后搜索到一组超参数后再去 evaluate。是解耦的两个阶段。Optuna 相当于没有区分两个阶段。

gaocegege commented 4 years ago

接下来，讲的就是算法了，其中包括参数搜索算法和 Early Stopping 算法。Optuna 采取了跟 NNI 非常类似的 API，report API 用来汇报 metrics，should-prune 确定要不要 early stopping，这个实现非常简单，但是对代码有侵入性

Screenshot from 2019-10-22 16-00-34

gaocegege commented 4 years ago

系统架构如图，非常简单。。。

gaocegege commented 4 years ago

Screenshot from 2019-10-22 15-48-56

最后来看看它跟其他框架的对比，其实 Katib 严格来说，还不支持框架级别的 Early Stopping，只能支持用户代码里定义的 Early Stopping。但从分布式执行来说，Katib 依赖 K8s，应该无人能出其右，支持各个层级的分布式和资源限制/控制（并行 Trial，一个 Trial 的并行训练）等。不过，这种 Define-by-run 的 API 风格，确实是第一个框架。

我个人来看，这个特性是好的，但是其带来的对用户代码的侵入性，是否利大于弊，还需思考

gaocegege commented 4 years ago

看了一下代码，实现非常简单易懂。大概就是有个 Database，然后每次去调用 optimize 的时候，就在 DB 里创建一个 Trial，然后在运行到 trial.suggest_xxx 的时候，就在数据库里创建一个 parameter 的取值。如果已经有对应的取值了，就直接 get 而不 create。

利用 Python 的动态解释的语言特性，就实现了 define-by-run。实现非常巧妙，而且简单。

dyweb / papers-notebook

Optuna: A Next-generation Hyperparameter Optimization Framework #182

Define-by-run