deephealthproject / eddl

European Distributed Deep Learning (EDDL) library. A general-purpose library initially developed to cover deep learning needs in healthcare use cases within the DeepHealth project.
https://deephealthproject.github.io/eddl/
MIT License
34 stars 10 forks source link

Export to ONNX randomly fails #339

Closed thistlillo closed 2 years ago

thistlillo commented 2 years ago

I have opened an issue on the PyEDDL github pages, but @simleo made me notice I should have used this section.

The issue is described here:

https://github.com/deephealthproject/pyeddl/issues/78

salvacarrion commented 2 years ago

Apparently you need to specificy the seq_len to export the RNN => save_net_to_onnx_file(Net *net, string path, int seq_len)

I'm testing more things. Once the issue is clear, I'll tell Simone so that he can do the python binding for you

salvacarrion commented 2 years ago

I leave here the c++ version of your code, just in case more debugging is needed.

#include <cstdio>
#include <cstdlib>
#include <iostream>

#include "eddl/apis/eddl.h"

#include "eddl/serialization/onnx/eddl_onnx.h" // Not allowed

using namespace eddl;

int main(int argc, char **argv) {

    int epochs = 1;
    int olength = 20;
    int outvs = 2000;
    int embdim = 32;

    Net* net = download_resnet18(true, {3, 256, 256});
    layer lreshape = getLayer(net, "top");
    layer dense_layer = HeUniform(Dense(lreshape, 20, true, "out_dense"));
    layer cnn_out = Sigmoid(dense_layer, "cnn_out");
    layer concat = Concat({lreshape, cnn_out}, 0, "cnn_concat");

    layer image_in = getLayer(net, "input");

    layer ldecin = Input({outvs});
    layer ldec = ReduceArgMax(ldecin, {0});
    ldec = RandomUniform(
            Embedding(ldec, outvs, 1, embdim, true), -0.05, 0.05
    );

    ldec = Concat({ldec, concat});
    layer l1 = LSTM(ldec, 512, true);
    layer out = Softmax(Dense(l1, outvs), 0, "out_cnn");

    setDecoder(ldecin);
    net = Model({image_in}, {out});

    build(net,
          adam(0.001),              // Optimizer
          {"soft_cross_entropy"},   // Losses
          {"categorical_accuracy"}, // Metrics
          CS_GPU({1}, 1, "full_mem"),                // Computing service
          true                      // Enable parameters initialization
    );
    summary(net);

    Tensor* x_train = Tensor::randn({48, 256, 256, 3});
    Tensor* y_train = Tensor::zeros({48,20, outvs});

    x_train->permute_({0, 3, 1, 2});
    y_train->set_select({":", ":", "0"}, 1.0);

    fit(net, {x_train}, {y_train}, 6, epochs);
    save(net, "img2text.bin", "bin");

     /////    /////    /////    /////    /////    /////    /////    /////    /////    /////
     ///// FIX: add the "seq_len" parameter => here, variable "length"
    save_net_to_onnx_file(net, "img2text.onnx", length); 
    /////    /////    /////    /////    /////    /////    /////    /////    /////    /////

    cout << "Saved net to onnx file" << endl;

    return 0;
}

=>

2.3661 secs/epoch
[ONNX::Export] Warning: The LSTM layer LSTM1 has mask_zeros=true. This attribute is not supported in ONNX, so the model exported will not have this attribute.
Saved net to onnx file
thistlillo commented 2 years ago

Apparently you need to specificy the seq_len to export the RNN => save_net_to_onnx_file(Net *net, string path, int seq_len)

Thanks @salvacarrion : Jon suggested to use that parameter, but unfortunately it was missing in the Python API. Now it has been added, but I need to wait for the modification to appear on the distribution channels. I got his error for the first time with release 1.3: it has never occurred with previous ones.

salvacarrion commented 2 years ago

Do you mind having a call on skype to make sure your exporting issue is fixed before doing the binding? I didn't understand a couple of things in the other issue

thistlillo commented 2 years ago

Replied. You have also a message on Skype. Thank you.

simleo commented 2 years ago

I need to wait for the modification to appear on the distribution channels

@thistlillo the Conda packages for pyeddl 1.3.1 have been made available two days ago

thistlillo commented 2 years ago

@simleo thank you, just updated the packages. I will post an update as soon as I launch a new training.

salvacarrion commented 2 years ago

After the meeting, I think we can close this issue

thistlillo commented 2 years ago

I can confirm that after using EDDL version 1.3.1 and the seq_len parameter the error has never occurred.