NervanaSystems / ngraph-python

Original Python version of Intel® Nervana™ Graph
http://ngraph.nervanasys.com/docs/legacy/
Apache License 2.0
215 stars 39 forks source link

Differences with NNVM Project #3

Open binarybana opened 7 years ago

binarybana commented 7 years ago

Addresses #1.

tqchen commented 7 years ago

I would jump in and advertise of NNVM https://github.com/dmlc/nnvm, which gives you the C API, front-end and some of the passes for free.

A Graph means a lot of things, it means layout tuning, execution, planning compilation. Some of them can be expressed in a fairly high level(most of them except for compilation), without ever have to restrict user to set of operations. NNVM aims to sit on that higher level layer, by providing ability for users to define pass that uses attributes of operators without constraining to specific set of operators yet.

I understand that ngraph sits in a lower level, which have a advantage of compilation, but many of the goals does not have to be that specific, where reuse nnvm modules brings quite a bit advantage and simplifies the lower level.

ghost commented 7 years ago

Thanks for reaching out @tqchen! We do indeed see ngraph and NNVM as sharing similar goals and sitting at similar levels of abstraction (I would not characterize ngraph as 'lower level' than NNVM for instance). However, the primary differences between ngraph and NNVM that we see are:

(1) We view specifying a base set of Ops (as opposed to leaving that completely up to each implementation) is important for the platform as a whole. This is similar to LLVM IR: a standardized central definition of computation at an appropriate level of abstraction allows languages as different as Haskell and C to compile down to the same IR with only a small subset of extra functionality to support Haskell (along with some GC metadata etc).

(2) This specification of base ops then allows us to bootstrap the ecosystem with a common set of transformers and their compiler passes that can be reused by many frontends and backends, rather than requiring each to roll their own. This is also similar to LLVM which has a rich ecosystem of existing passes that work with each other because they all operate on a predefined IR.

We look forward to learning from each other as we explore this design space together.

tqchen commented 7 years ago

By "lower level", What I intended to mean is the different you pointed out. NNVM is more abstract, without restricting to primitive set of ops, while ngraph specialize to define a primitive set of ops.

When we developed NNVM, we explicitly considered the possibility of the primitive set of ops, and eventually concluded that there should be two level of abstractions for two purposes of graph: 1) Schedule, planning, inference, layout tuning, everything that is de-coupled from compilation. 2) Compilation.

The first part could be achieved without restricting to a primitive set of operators, while benefit all of the existing frameworks, even with things like customized operators or lua operators.

NNVM aims to solve the first set of problems, by specifying a base set of Attributes of Ops, such as shape inference function, cost function of execution and code generator.

Restricting everything to the primitive ops constraints what existing frameworks could do in terms of customization, scheduling and other abilities. I think making things modular and leaving that part of choice to framework is important, especially for those who will also have imperative operations.

This being said, there is always a need for compilation and thus primitive set of ops. Which could be modeled as one type of operator attribute. Compilation on certain subgraph with primitive ops, and running with framework defined ops will be as powerful, if not more attractive to the frameworks.

So I think there could be interesting chance here. For example, since nnvm is more abstract, ngraph could be built by registering the primitive set of ops to NNVM, and define nnvm pass to transform higher level operations to the primitive set that is used in ngraph. We can keep many optimization at level of NNVM, while benefit from compilation in primitive ops as in ngraph

diyessi commented 7 years ago

I think what you are calling the "primitive ops" are the arbitrary dimension layout-independent (although layout is currently chosen as the trivial layout given the axes) tensor primitives for supporting autodiff. We define other ops as compositions of these ops so that users do not in general need to implement backprop for new ops, and plan to add abstraction ops for defining ops in terms of other ops. There is also a set of lower level fixed-dimension ops which are used by transformers for code generation and passes that convert the graph to use these ops in a process that will change when we do layout.

tqchen commented 7 years ago

Yes, I think that is what I mean. As my point in the last post. NNVM is more abstract in a sense that it do not restrict user to confine their operations in the set of primitive ops, instead relying on common set of attributes (which serve as a common ground between systems).

The pros and cons of approach have been discussed in my last post.

tqchen commented 7 years ago

My point is that there are essentially two levels of abstractions, the abstract level(which nnvm does) and the concrete level(with set of primitive ops, like ngraph).

Many of the things that can be done in the abstract level could benefit frameworks who want to have their own definitions of ops, or extensions with customization of front-end language, without taking over the entire execution logic.

The primitive op approach could complement the abstract level, and provide compilation and optimization for certain subgraph (instead of taking over all the graph).

It is hard to tell the framework to simply transform everything into a fixed set of operators, as there is always a need to add bulk operators, customization(via python), and specific operators that works for applications(e.g.g ROIPooling)

ghost commented 7 years ago

I appreciate the attribute based ops and compiler passes that NNVM is proposing, but doesn't this mean that each implementation must re-implement common functionality such as fusion semantics and algebraic simplifications such as log(exp(x)) == x for their particular types of Ops?

We instead see the ability to add new ops as a supported-but-rare case, and to optimize for the common case while still allowing the flexibility when needed.

tqchen commented 7 years ago

I do not disagree on the set of primitive ops.

Actually that is what I am proposing in my first post. In NNVM, this can be achieved as an attribute "FPrimitiveOp", which indicate the operator is indeed correspond to one of the primitive op. The fusion pass can be defined on top of this attribute, releasing framework specific implementation of fusion semantics.

I think that is the way where ngraph and NNVM could work together.

tqchen commented 7 years ago

To summarize, set of primitive ops is good for compilation related rewriting, which can be expressed via a PrimitiveOp attribute.

For things other than compilation(scheduling, inference, distributed workload partitioning), the attribute based abstraction helps existing framework to kick in without having to offload all the workload to the primitives.

By separating compilation, execution and scheduling, the framework builder can pick the parts they want to assemble a framework that is good in their perspective.

diyessi commented 7 years ago

For an operation like "add" we have multiple ops, specialized on argument types (you could think of them as multi-methods of a generic add if you wanted). The backend determines which of these ops can be fused. For example, NumPy has far fewer restrictions on tensor dimensions and layout than a C library, so these attributes are joint properties of the backend and the op. We want the same graph to run optimally on whatever backend is available at the time, provided the user hasn't indicated they want to use some back-end specific capabilities (analogous to using inline assembly or system calls in a program).

tqchen commented 7 years ago

@diyessi again I understand the advantage of compilation tricks and primitive ops, and they can be achieved with the approach in my last post.

But my point is there are things that are not compilation, which could be done in abstract level, with NNVM and the set of primitive attributes possibly used in ngraph, that can be done more elegantly, and in a more transparent manner.

diyessi commented 7 years ago

nnvm implements a simple type system outside of C++, with attributes serving a role something like interfaces in Java. We want to be able to define additional pass-specific attributes outside of the op definition, and might want shallow type inheritance on some of the attributes, but balancing ease of use with ease of implementation. I think it would probably be possible to generate language bindings for the ops from the op definitions.