KhronosGroup / NNEF-Tools

The NNEF Tools repository contains tools to generate and consume NNEF documents
https://www.khronos.org/nnef
222 stars 57 forks source link

nnvm as a light weight json persistence and internal optimization format #71

Closed jnorwood closed 5 years ago

jnorwood commented 5 years ago

I noticed that tvm uses a light-weight json format as a bridge from mxnet. I believe it is also used as in bridge format to ngraph.

https://docs.tvm.ai/dev/nnvm_json_spec.html

Tvm has also been using nnvm as a simple internal optimization graph format. It permits new ops to be registered and new attributes to be added to the nodes.

I notice that the netron viewer also supports tvm now, with this nnvm format, as well as mxnet.

It appears to me that it would not be too hard to export nnef to this format, which might then be used as a useful internal representation for optimizations. The use of ngraph, tvm and netron would also then be available.

gyenesvi commented 5 years ago

Yes, this should be easy to convert to and could be useful. Volunteering for writing such a converter would be appreciated. On the long run, tooling around NNEF would only be scalable if it was a community effort. Let me know if you are interested in contributing.

jnorwood commented 5 years ago

I'm looking at what ngraph is doing with the nnvm json. They use code from https://github.com/nlohmann/json for working with json in c++ to read in the mxnet nnvm json. But it looks like they use their own c++ objects during their graph node optimizations.

It also looks like tvm is just using nnvm for the mxnet bridge. They've developed a dsl named Relay that looks like is getting their focus now.

Both ngraph and tvm process onnx as input from other frameworks.

gyenesvi commented 5 years ago

The Relay IR looks interesting, and nGraph has also its own IR. And Glow as well.. Not sure where this leads.

I have some work in progress to create an NNEF frontend to nGraph through PlaidML, and Glow is also planned. Adding NNEF as a frontend to TVM would also be useful.

jnorwood commented 5 years ago

I looked at PlaidML, since it already is an ngraph backend, but I also see that the tvm comparisons here show that tvm is at a more advanced stage of optimization.

https://tvm.ai/2018/10/03/auto-opt-all.html https://sampl.cs.washington.edu/tvmconf/slides/Jared-Roesch-Relay.pdf

My guess is that plaidml wll be in development for a while, since Intel apparently wants them to support their NNP and fpga chip optimizations ... maybe also for the planned gpus.

Ngraph's cpu backend support seems very fast. I got 384fps on squeezenet with their demo on a nuc box corei7. Ngraph's build took about 30 minutes for me on ubuntu. Their runtime library is something like 300MB, and triple that with debugging symbols. gdb load time for symbols was about 50 sec.

tvm is very light weight in comparison.

nnef doesn't currently support target memory models that are unique for fpgas, gpus and NUMA cpus. It appears to be something that needs to be specified, maybe similar to the way nnef has added the quant file. Are you doing something like that with plaidML?

jnorwood commented 5 years ago

TVM uses HalideIR as data structure for arithmetic simplification and low level lowering. https://github.com/dmlc/tvm/blob/master/README.md

so they convert the incoming mxnet nnvm format to this HalideIR before doing optimizations.

jnorwood commented 5 years ago

I looked more at the nnvm format supported by tvm. They dump to a an nnvm compatible file, but with different ops and attributes than used by mxnet. Their format is described in https://docs.tvm.ai/dev/debugger.html.
There is a conversion map between nnvm ops and mxnet ops in https://github.com/dmlc/tvm/blob/master/nnvm/python/nnvm/frontend/mxnet.py

gyenesvi commented 5 years ago

I don't quite understand what would a memory model mean in NNEF, and why that would be specified. NNEF is a high level container format independent of the execution runtime, hence independent of memory models. Can you give an example of where this would be useful?

jnorwood commented 5 years ago

The optimized graphs that target specific hardware can make choices such as thread affinity or tile sizes based on cache sizes or NUMA restrictions. So, you're saying that NNEF is only going to translate graphs at the high level, and would not, for example, be concerned with persisting the tile size choices made by an optimizer.

gyenesvi commented 5 years ago

Yes, NNEF is an exchange format, used to pass information from training frameworks into compiler stacks, so it is typically the input into compiler stacks. Such details would be present in an IR, which is used inside compiler stacks. So NNEF could be translated to the IR, and the optimizers, knowing some details about the HW could add such info to the IR.

jnorwood commented 5 years ago

I have been playing around with the nnvm/tvm HalideIR format to see what netron will support. I'm going to attach a sample file that Netron does support. However, it didn't display the func_name attribute, which is part of their spec. I have a feeling that is a bug, since they displayed the other defined attributes.

Also, netron didn't display the graph level attributes. Seems like something that they could add, or maybe I just missed it. tvm.json.zip