Open sangameshnr opened 6 years ago
Also, when I added tf.nn.bias_add after the tf.nn.conv2d in the TF code. The output tensors were concatenated in ngraph instead of adding them up since the axes were different. I had to write a separate function to implement this addition on the "ops_bridge.py" to make their axes same before adding.
We are working on a new version of ngraph (nGraph++) which is being implemented in C++ and the performance issue you are observing will be addressed in the forthcoming release.
Thanks @avijit-nervana , I will look forward to the release.
Could someone please guide on if there's any update on this? Do we have nGraph++ which resolves the issue mentioned by Sangameshnr? I want to use TF frontend and create a new backend for nGraph.
@shubhamn94 We do have a C++ implementation for nGraph: https://github.com/NervanaSystems/ngraph.git The TensorFlow bridge is here: https://github.com/NervanaSystems/ngraph-tf.git
To get started, please clone the nGraph-TensrFlow bridge and run build_ngtf.py
to build the TF version of nGraph (all C++).
Please let us know if you have any questions Will be happy to help you put together a new backend for nGraph.
Sure. Thanks @avijit-nervana . Will let you know.
@shubhamn94 We recently moved the nGraph bridge repository to TensorFlow organization. Here's the new URL: https://github.com/tensorflow/ngraph-bridge
(The nGraph library remains in the same location: https://github.com/NervanaSystems/ngraph.git)
Please update your bookmarks.
Thanks for update!
Hi,
The performance of the VGG16 network imported to ngraph via TF frontend appears to be very very slow. To get the perspective, I have 3 implementations of VGG16: 1) Neon-Ng : Neon frontend in ngraph ( VGG16 using layers of neon frontend) 2) TF : TensorFlow code (VGG16 using tf.nn.conv2d, tf.nn.bias_add, tf.nn.relu/max_pool etc) 3) TF-Ng : I import the checkpoint files of the implementation 2 here .
And for batch size of 64 on a skylake machine I get the following performances: Implementation 1: 1561 GFlops/s Implementation 2: 1309 GFlops/s ( Using tensorflow 1.4.0-dev on intel python) Implementation 3: 51.82 GFlops/s.
I am trying to investigate why the TF frontend in ngraph is slow. Any inputs would be very helpful.