apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.76k stars 6.8k forks source link

[Discussion] Sharing Operators between DL Frameworks #4735

Closed tqchen closed 7 years ago

tqchen commented 7 years ago

See This Link for discussion repo

This discussion started from https://github.com/dmlc/minpy/issues/129, with @soumith THC is a tensor library that backs torch. I open this issue in MXNet repo so more developers can see it.

First of all, it is possible reuse operator libraries between frameworks, for example

It is always interesting to see interchangeability happen. For example, schedule pytorch operations in mxnet's async engine, or run mxnet's declarative API to directly share data with pytorch's array.

However, there is some engineering obstacles in doing so, which I would like to explain what these obstacles are, and hopefully this can motivate the community to move forward, and make this easier.

Coupled Operator Data Structure Components

An operator can mean many things, here are some basic components on what the operators are:

Why such coupling prevents reuse? There are two reasons

To resolve this problem, an operator library design should enable operators that accept user managed memory resources, when possible, not introduce allocator or resource management, but give hints to the user(CuDNN's workspace requirement eliminates the need to internal memory allocator).

From this point of view, CuDNN an cuBLAS are good examples. THC is nice, but still encapsulate memory allocator(which is needed sometimes for dynamic operators).

Lack of Unified Operator Interface

The second obstacle is mainly lack of common operator interface. This is a problem of CUDNN and THC that prevents reusing. Take CuDNN for example, each CuDNN API is a C function, with its own interface, to adopt the operator, there need to be one(or multiple) adapting function per operator.

Consider instead, if there is an unified operator interface(the following is a mock design), where each TBlob is a reference to the data fields and shape, and every function gets registered to the registry with their name

using FCompute = std::function<void (
   array_view<TBlob> ins, array_view<TBlob> outs, map kwargs, stream stream)>

Then it only takes one function to extract, and reuse all operators and automatically expose them to front end. In MXNet, it even directly generates the symbolic counterpart from the same imperative operator, if gradient is provided.

Problem of One Unified Operator Interface

There is always a flip side of the coin. Assume that we go with a unified operator interface. As a matter of fact, that is what MXNet, TensorFlow and Caffe have done. The problem now becomes what the interface should look like? One trap that framework designer always falls into is that we need one interface that rules them all.

Since one interface rules them all, we want to support all possible operators, what about the ones that need runtime memory allocations? Maybe add memory allocator to it, what about the ones that is asynchronize? In the end, the interface have to include memory-allocator, scheduling module in some way, and that introduces the "Coupled Operator Data Structure Components" problem. The operator interface become deeply coupled with the rest of the framework and not reusable.

A Better Solution: A Few Unified Interfaces

Can we get the best of both worlds, having as few data structures and interfaces as possible, while still not introducing coupling to allocator and scheduling as much as possible? I think the answer is yes and we need to jump out from the ideal of one interface that rules all the operators.

I can categorize the operators roughly in three categories

If we design for general operator interface, the answer will usually looks like type3. However, type 1 and 2 dominates 90%+ of the major operators we are using. If we design one operator interfaces for each type, this problem is solved. So that frameworks can pull and interact with each type in their own way. It is much easier to do things like static memory planning if type1 and type2 are explicitly introduced. This is one additional layer of wrapping on top of THC and CuDNN is is lacking so far.

A registry system like NNVM could come very handy to easily resgister these informations, and get pull out by the libraries.

The Hope

I have always hopped that there is a minimum set of operator interface standard in C++, that can be shared across libraries. I think we have a good idea on what the solution looks like. While most system tends to become opague and coupled, I think this kind of transparent way can help evolve the community in a healthy way. This being said, there is always effort to make these happen. This involves a open discussion on what the interfaces should be and commitment from framework builders. I would really love to see this happen, and that is why I spend more than one hour writing this.

Unfortunately, most frameworks already have kinda of "enough collection of operators", so having a unified operator interface will contribute little to each framework in terms of usability in short term. Naturally this would be given lower priority. That is why commitment is needed to bring this out for longer term benefit

mli commented 7 years ago

how about libop?

i prefer to not use tensor because mathematically tensor has a rich of properties, while most operators we are using are just elemental-wise, so n-dimensional array is better to name the data structure.

Yangqing commented 7 years ago

Following the idea of BLAS we can probably call it BDAS (basic deep-learning algebra subprograms) - sounds like "badass".

piiswrong commented 7 years ago

LOL. In that spirit how about BAsic Neural Artificial Network Algebra Subroutines (BANANAS)

mli commented 7 years ago

Or Deep Learning PACKage, DLPACK, motivated from LAPACK. We are providing more than basic subprograms.

On Sun, Feb 12, 2017 at 11:28 PM, Yangqing Jia notifications@github.com wrote:

Following the idea of BLAS we can probably call it BDAS (basic deep-learning algebra subprograms) - sounds like "badass".

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dmlc/mxnet/issues/4735#issuecomment-279312506, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZv4RjloVzmTaOnuGKOIPLiBDtTBCQ4ks5rcAYHgaJpZM4Lobt0 .

bhack commented 7 years ago

DLPACK is not so bad.

futurely commented 7 years ago

TensorFlow & Keras combined have the largest user base and are growing most rapidly. You should bring those guys on board for this proposal to make the biggest impact.

http://www.timqian.com/star-history/#tensorflow/tensorflow&fchollet/keras&dmlc/mxnet&BVLC/caffe&Microsoft/CNTK&torch/torch7&Theano/Theano image

bhack commented 7 years ago

/cc @fchollet

cyberfire commented 7 years ago

Just two points:

  1. Before naming the project, it should be clear about the targets and boundaries of this project clear first: how many issues are there and what issues are to be resolved at what stage.
  2. Using an agile method to start from easy targets: those are probably to be resolved quickly and easy to get agreement from all participants.
jli05 commented 7 years ago

As a side topic, I personally think how to allow MXNet scale out over micro-kernel multi-server OSes and scale down on limited-battery devices is also important.

tqchen commented 7 years ago

created a repo here https://github.com/dmlc/dlpack

tqchen commented 7 years ago

Let us move the discussion to https://github.com/dmlc/dlpack/issues,