egolearner / paper-note

7 stars 2 forks source link

ModelHub: Lifecycle Management for Deep Learning[4] #16

Open egolearner opened 3 years ago

egolearner commented 3 years ago

17年ICDE论文,引用不多。

https://www.cs.umd.edu/class/spring2016/cmsc396h/downloads/modelhub.pdf

作者认为的挑战是

image

ModelHub由3部分构成:

  1. a model versioning system (DLV) to store and query the models and their ver- sions
  2. a model enumeration and hyper-parameter tuning domain specific language (DQL) to serve as an abstraction layer to help modelers focus on the creation of the models instead of repetitive steps in the lifecycle
  3. a hosted deep learning model sharing system (ModelHub) to publish, discover and reuse models from others

2.2 DataModel

以层作为模型的基本单元。

VCS数据模型:

2.3 Query Facilities

dlv list [--model_name] [--commit_msg] [--last]

dlv desc [--model_name | --version] [--output]

展示模型元数据,如network definition, learnable parameters, execution footprint (memory and runtime), activations of convolution networks, weight matrices, and evaluation results across iterations

dlv diff [--model_names | --versions] [--output]

主要是desc的结果side-by-side的对比。

dlv eval [--model_name | --versions] [--config]

使用不同的数据来test模型,或者修改超参数(修改超参不重新训练模型吗???)

image

DQL支持查询模型,获得模型的一部分(slice),修改模型,自动搜索超参。想法比较有意思,但感觉意义不大,前两个Query从学习用途来说有点用,但感觉不如直接看代码;Query3似乎不如直接改原始代码,多了DSL的学习成本;Query4似乎是用SQL封装了简单automl和自动评估保留模型。

Take away