ai-techsystems / deepC

vendor independent TinyML deep learning library, compiler and inference framework microcomputers and micro-controllers
https://cainvas.ai-tech.systems/
Apache License 2.0
555 stars 86 forks source link

ONNX model exported from PyTorch is incorrect for onnx-cpp #131

Closed thibnoel closed 4 years ago

thibnoel commented 4 years ago

Hello, I am currently trying to generate some C code from a trained PyTorch model. I tried to follow the first example provided in the tutorials ("Intermediate codegen and generate binary/bundle for your model") but the process fails at the first step. Here is how I proceed :

graph(%input.1 : Float(4), %layers.0.bias : Float(8), %layers.1.bias : Float(1), %12 : Float(4, 8), %13 : Float(8, 1)): %6 : Float(8) = onnx::MatMul(%input.1, %12) # /home/tnoel/.local/lib/python3.6/site-packages/torch/nn/functional.py:1612:0 %7 : Float(8) = onnx::Add(%6, %layers.0.bias) %8 : Float(8) = onnx::Tanh(%7) # solo_shoulder_approx_torch_nn.py:49:0 %10 : Float(1) = onnx::MatMul(%8, %13) # /home/tnoel/.local/lib/python3.6/site-packages/torch/nn/functional.py:1612:0 %11 : Float(1) = onnx::Add(%10, %layers.1.bias) return (%11)

To me, the ONNX model looks well-formed at this point, and the ONNX checker from the Python `onnx` lib does not throw any error when checking it.
- I finally try to run `onnx-cpp testnet.onnx` as shown in the example, but this is where I get the following errors : 
```bash
reading onnx model from file  testnet.onnx
Model info:
  ir_vesion :  6 
  doc       : 
INFO (ONNX): writing model parameter 12 to dir .
INFO (ONNX): writing model parameter 13 to dir .
INFO (ONNX): writing model parameter layers.0.bias to dir .
INFO (ONNX): writing model parameter layers.1.bias to dir .
running DNNC graph sanity check.
ERROR (GRAPH): some of graph torch-jit-export's node MatMul_0's
               outputs are not connected to other nodes in the graph.
ERROR (GRAPH): some of graph torch-jit-export's node Add_1's
               outputs are not connected to other nodes in the graph.
ERROR (GRAPH): some of graph torch-jit-export's node MatMul_3's
               outputs are not connected to other nodes in the graph.
ERROR (GRAPH): some of graph torch-jit-export's node Add_4's
               outputs are not connected to other nodes in the graph.
        FAILED. Please check your model.
Writing C++ file  /testnet.cpp
ERROR (CODEGEN): cound not find all nodes for MatMul_0,
                 an instance of MatMul.
                 Please check model's sanity and try again.
ERROR (CODEGEN): cound not find all nodes for Add_1,
                 an instance of Add.
                 Please check model's sanity and try again.
ERROR (CODEGEN): cound not find all nodes for MatMul_3,
                 an instance of MatMul.
                 Please check model's sanity and try again.
ERROR (CODEGEN): cound not find all nodes for Add_4,
                 an instance of Add.
                 Please check model's sanity and try again.
ERROR (CODEGEN): could not open file testnet.cppto write.
INFO (ONNX): model files are ready in dir 

The command still creates and populates the following files : 12, 13, layers.0.bias, layers.1.bias (with numerical values), and also creates the following testnet.cpp file (which looks quite not correct) :

#include "operators/MatMul.h"
#include "operators/Add.h"
#include "operators/Tanh.h"
#include "operators/MatMul.h"
#include "operators/Add.h"

using namespace dnnc;

int maint(){
  tensor<float> dnnc_input_dot_1(4);

  Tanh<float,float> Tanh_2("Tanh_2");
  tensor<float> dnnc_Tanh_2_8 = Tanh_2.compute ( dnnc_Add_1_7);

  return 0;
}

Would you please have any idea about how I should modify my model so that it is correctly handled by onnx-cpp? I dont really know where to look for now, and I dont see why the nodes are not correctly parsed from the ONNX model. Thanks in advance!

github-actions[bot] commented 4 years ago

Thank you so much for filing the issue. We will look at it and take appropriate action as soon as possible.' first issue

srohit0 commented 4 years ago

Thanks @thibnoel for the report.

As a workaround, you can try it on platform https://cainvas.ai-tech.systems/ that hosts deepC.

thibnoel commented 4 years ago

Hello @srohit0, thanks for your quick answer! Your suggestion of converting the model on your platform directly worked for the step 1 of the example. However, I feel like there is something I still dont get with the overall workflow; I now tried to compile it with the example command you provide and it fails with the following errors (some of the seem due to bad naming but the others are related to dnnc I think) :

jupyter-tnoel@ip-172-31-18-137:~/deepC/test/compiler/mnist$ /usr/bin/clang++-8 -O3 testnet.cpp -I ../../../include/ -isystem ../../../packages/eigen-eigen-323c052e1731/ -o testnet.exe

In file included from testnet.cpp:24:
/home/jupyter-tnoel/testnet.h:22:7: error: expected unqualified-id
float 12[] = {
      ^
/home/jupyter-tnoel/testnet.h:29:7: error: expected unqualified-id
float 13[] = {
      ^
testnet.cpp:49:17: error: no matching constructor for initialization of 'tensor<float>'
  tensor<float> dnnc_12(12, {4, 8}, "12", false);
                ^       ~~~~~~~~~~~~~~~~~~~~~~~
../../../include/core/tensor.h:113:3: note: candidate constructor not viable: no known conversion from 'int' to 'std::vector<DIMENSION>' (aka 'vector<unsigned long>') for 1st argument
  tensor(std::vector<DIMENSION> dimn = std::vector<DIMENSION>(),
  ^
../../../include/core/tensor.h:123:3: note: candidate constructor not viable: requires at most 3 arguments, but 4 were provided
  tensor(T *data, std::vector<DIMENSION> dimn, std::string n = "")
  ^
../../../include/core/tensor.h:139:3: note: candidate constructor not viable: requires single argument 'other', but 4 arguments were provided
  tensor(tensor const &other) : placeHolder<T>(other) {
  ^
testnet.cpp:50:17: error: no matching constructor for initialization of 'tensor<float>'
  tensor<float> dnnc_13(13, {8, 1}, "13", false);
                ^       ~~~~~~~~~~~~~~~~~~~~~~~
../../../include/core/tensor.h:113:3: note: candidate constructor not viable: no known conversion from 'int' to 'std::vector<DIMENSION>' (aka 'vector<unsigned long>') for 1st argument
  tensor(std::vector<DIMENSION> dimn = std::vector<DIMENSION>(),
  ^
../../../include/core/tensor.h:123:3: note: candidate constructor not viable: requires at most 3 arguments, but 4 were provided
  tensor(T *data, std::vector<DIMENSION> dimn, std::string n = "")
  ^
../../../include/core/tensor.h:139:3: note: candidate constructor not viable: requires single argument 'other', but 4 arguments were provided
  tensor(tensor const &other) : placeHolder<T>(other) {
  ^
testnet.cpp:51:17: error: no matching constructor for initialization of 'tensor<float>'
  tensor<float> dnnc_layers_dot_0_dot_bias(layers_dot_0_dot_bias, {8}, "layers_dot_0_dot_bias", false);
                ^                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../include/core/tensor.h:113:3: note: candidate constructor not viable: no known conversion from 'float [8]' to 'std::vector<DIMENSION>' (aka 'vector<unsigned long>') for 1st argument
  tensor(std::vector<DIMENSION> dimn = std::vector<DIMENSION>(),
  ^
../../../include/core/tensor.h:123:3: note: candidate constructor not viable: requires at most 3 arguments, but 4 were provided
  tensor(T *data, std::vector<DIMENSION> dimn, std::string n = "")
  ^
../../../include/core/tensor.h:139:3: note: candidate constructor not viable: requires single argument 'other', but 4 arguments were provided
  tensor(tensor const &other) : placeHolder<T>(other) {
  ^
testnet.cpp:52:17: error: no matching constructor for initialization of 'tensor<float>'
  tensor<float> dnnc_layers_dot_1_dot_bias(layers_dot_1_dot_bias, {1}, "layers_dot_1_dot_bias", false);
                ^                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../include/core/tensor.h:113:3: note: candidate constructor not viable: no known conversion from 'float [1]' to 'std::vector<DIMENSION>' (aka 'vector<unsigned long>') for 1st argument
  tensor(std::vector<DIMENSION> dimn = std::vector<DIMENSION>(),
  ^
../../../include/core/tensor.h:123:3: note: candidate constructor not viable: requires at most 3 arguments, but 4 were provided
  tensor(T *data, std::vector<DIMENSION> dimn, std::string n = "")
  ^
../../../include/core/tensor.h:139:3: note: candidate constructor not viable: requires single argument 'other', but 4 arguments were provided
  tensor(tensor const &other) : placeHolder<T>(other) {
  ^
testnet.cpp:54:9: error: too many template arguments for class template 'MatMul'
  dnnc::MatMul<float, float, float> MatMul_0("MatMul_0");
        ^             ~~~~~~~~~~~~~
../../../include/operators/MatMul.h:30:29: note: template is declared here
template <typename T> class MatMul : public baseOperator<T, T, T> {
~~~~~~~~~~~~~~~~~~~~~       ^
testnet.cpp:58:9: error: too many template arguments for class template 'Add'
  dnnc::Add<float, float, float> Add_1("Add_1");
        ^                 ~~~~~~
../../../include/operators/Add.h:37:7: note: template is declared here
class Add : public baseOperator<To, Ti, Ti> {
      ^
testnet.cpp:62:9: error: too many template arguments for class template 'Tanh'
  dnnc::Tanh<float, float> Tanh_2("Tanh_2");
        ^           ~~~~~~
../../../include/operators/Tanh.h:31:29: note: template is declared here
template <typename T> class Tanh : public baseOperator<T, T, T> {
~~~~~~~~~~~~~~~~~~~~~       ^
testnet.cpp:66:9: error: too many template arguments for class template 'MatMul'
  dnnc::MatMul<float, float, float> MatMul_3("MatMul_3");
        ^             ~~~~~~~~~~~~~
../../../include/operators/MatMul.h:30:29: note: template is declared here
template <typename T> class MatMul : public baseOperator<T, T, T> {
~~~~~~~~~~~~~~~~~~~~~       ^
testnet.cpp:70:9: error: too many template arguments for class template 'Add'
  dnnc::Add<float, float, float> Add_4("Add_4");
        ^                 ~~~~~~
../../../include/operators/Add.h:37:7: note: template is declared here
class Add : public baseOperator<To, Ti, Ti> {
      ^
11 errors generated.

Would you please have any leads about how to keep going? Thanks in advance! Best

srohit0 commented 4 years ago

@thibnoel - once the graph has errors, everything from that point is malformed. No point looking at the code, much less compiling it.

thibnoel commented 4 years ago

Hello @srohit0, thanks for the help, I think my graph is now correct but I will take a pause on this as I found a workaround that does not require generating the model through deepC. Anyway I might come back to it, I'm closing this issue in the meantime :) Best