dyweb / papers-notebook

:page_facing_up: :cn: :page_with_curl: 论文阅读笔记(分布式系统、虚拟化、机器学习)Papers Notebook (Distributed System, Virtualization, Machine Learning)
https://github.com/dyweb/papers-notebook/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+-label%3ATODO-%E6%9C%AA%E8%AF%BB
Apache License 2.0
2.13k stars 244 forks source link

DeepDSL: A Compilation-based Domain-Specific Language for Deep Learning #101

Open gaocegege opened 5 years ago

gaocegege commented 5 years ago

https://openreview.net/forum?id=Bks8cPcxe

Abstract: In recent years, Deep Learning (DL) has found great success in domains such as multimedia understanding. However, the complex nature of multimedia data makes it difficult to develop DL-based software. The state-of-the-art tools, such as Caffe, TensorFlow, Torch7, and CNTK, while are successful in their applicable domains, are programming libraries with fixed user interface, internal representation, and execution environment. This makes it difficult to implement portable and customized DL applications.

In this paper, we present DeepDSL, a domain specific language (DSL) embedded in Scala, that compiles deep networks written in DeepDSL to Java source code. Deep DSL provides

(1) intuitive constructs to support compact encoding of deep networks; (2) symbolic gradient derivation of the networks; (3) static analysis for memory consumption and error detection; and (4) DSL-level optimization to improve memory and runtime efficiency.

DeepDSL programs are compiled into compact, efficient, customizable, and portable Java source code, which operates the CUDA and CUDNN interfaces running on NVIDIA GPU via a Java Native Interface (JNI) library. We evaluated DeepDSL with a number of popular DL networks. Our experiments show that the compiled programs have very competitive runtime performance and memory efficiency compared to the existing libraries. TL;DR: DeepDSL(a DSL embedded in Scala) that compiles deep learning networks written in DeepDSL to Java source code, which runs on any GPU equipped machines with competitive efficiency as existing state-of-the-art tools (e.g. Caffe and Tensorflow) Keywords: Deep learning, Applications, Optimization Conflicts: uwm.edu, cs.uml.edu, fresnostate.edu, utc.edu

gaocegege commented 5 years ago

ICLR'17

gaocegege commented 5 years ago

这篇文章很有意思,定义了一个 DSL,直接编译成 Java 源码,不依赖任何现有的 framework,同时性能比 framework 要高

gaocegege commented 5 years ago

读了一下,感觉有一位 reviewer 的意见很中肯,就不写笔记了,我对这种方式的一个存疑之处在于如何支持分布式的训练,估计是不支持的。

https://openreview.net/forum?id=Bks8cPcxe&noteId=rJ02NpS4x

Pros:

The use of Scala is unique among deep learning frameworks, to my knowledge, making this framework interesting for Scala users. The fact that Scala compiles to Java and therefore cross-platform support comes for free is also nice.

The ability to inspect memory information as shown in Figure 3 is interesting and potentially useful for large networks or situations where memory is limited.

DeepDSL compares favorably with existing frameworks in terms of memory use and speed for many common convolutional network architectures.

Cons:

There appears to be special privileged handling of parameters, gradients, and updates in the compilation process itself (as in Caffe), rather than having gradients/updates as a normal part of the full user-defined computation graph (as in Theano + TensorFlow). This makes certain applications, such as RNNs (which require parameter sharing) and GANs (which require gradients wrt multiple objectives), impossible to implement in DeepDSL without further extension of the underlying API. (Note: I might be wrong about this -- and please correct me if I am -- but all the examples in the paper are nets trained by gradient descent on a single objective, and do not share parameters or access gradients directly.)

The paper repeatedly refers to line counts from the verbose Protobuf-based low-level representation of networks in Caffe to demonstrate the compactness of its own syntax. This is misleading as Caffe has officially supported a compact network definition style called “NetSpec” for years -- see a ~15 line definition of AlexNet [1]. Given that, Protobuf is essentially an intermediate representation for Caffe (as with TensorFlow), which happens to have a human-readable text format.

DeepDSL is not especially novel when compared with existing frameworks, which is not a problem in and of itself, but some statements misleadingly or incorrectly oversell the novelty of the framework. Some examples:

“This separation between network definition and training is an unique advantage of DeepDSL comparing to other tools.” This separation is not unique -- it’s certainly a feature of Caffe where the network definition is its own file, and can be attained in TensorFlow as well (though it’s not the default workflow there).

“The difference [between our framework and Theano, TensorFlow, etc.] is that we do not model deep networks as ‘networks’ but as abstract ‘functions’.” There is no notion of a “network” in Theano or TensorFlow (not sure about the others) either -- there are only functions, like in DeepDSL. I asked about this statement, and the response didn’t convince me otherwise. The counterexample given was that in TensorFlow the number of input channels needs to be specified separately for each convolution. This is only true using the low-level API and can easily be worked around with higher-level wrappers like TensorFlow Slim -- e.g., see the definition of AlexNet [2]. It may be true that DeepDSL is more “batteries included” for writing compact network definitions than these other frameworks, but the paper’s claims seem to go beyond this.

Overall, the DeepDSL framework seems to have real value in its use of Scala and its memory/speed efficiency as demonstrated by the experiments, but the current version of the paper contains statements that overclaim novelty in ways that are misleading and unfair to existing frameworks. I will consider upgrading my rating if these statements are removed or amended to be more technically precise.

[1] https://github.com/BVLC/caffe/blob/master/examples/pycaffe/caffenet.py#L24 [2] https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/slim/python/slim/nets/alexnet.py#L92