Question about encoder-decoder architecture in the examples

georgemavrakis-wings commented 3 years ago

Good afternoon,

I want to create a model based on seq2seq architecture for UC1 and I saw the example about machine translation, where an encoder-decoder architecture is used (https://github.com/deephealthproject/eddl/blob/master/examples/nn/4_NLP/3_nlp_machine_translation.cpp).

However, I am puzzled about some parts of the circuit and I was hoping if you could explain them to me:

a) The GetStates layer extracts the states of the LSTM layer for all timesteps. Is it possible to extract only the last timestep's state? b) What is the use of setDecoder()? c) As far as I can understand, the output to the decoder's ldin layer is the y_train shifted by +1 and takes zeros at 0 position. Correct? d) How is the input of the LSTM distributed to the cells? Does each cell take as input the concatenated ldin and one of the encoder's states?

Thank you in advance.

RParedesPalacios commented 3 years ago

You can check the unrolled model obtained by EDDL after the call to fit. In the machine translation example you can check that unrolling opening the file rmodel.pdf that is always automatically created.

a) You will see that only the last GetState is connected with the decoder LSTM.

b) setDecoder is necessary to inform when the Decoder starts. It allow us to split the net.

c) Yes completely correct

d) depending on the example. For instance in the text description example we concatenate by using a ConCat Layer but is open to the user. Anyway not sure If I understand your question, please check the rmodel.pdf created, probably it will help you a lot.

You are very welcome.

georgemavrakis-wings commented 3 years ago

Good afternoon,

I wish you happy easter!

@RParedesPalacios, you are right. I managed to create an initial seq2seq architecture for my model. However, I have two issues in my architecture that I think EDDL does not support:

a) In the decoder part, I want to concatenate the output of my GRU layer, with the encoder states and then pass the concatenated result to a dense layer. I have tried the following:

# 1)
concat_layer = eddl.Concat([decoder_l, encoder_states]) # decoder_l: decoder GRU layer, encoder_states: encoder GRU hidden states

# 2)
concat_layer = eddl.Concat([decoder_l, encoder_l]) # decoder_l: decoder GRU layer, encoder_l: encoder GRU layer

which both gave me the error: LConcat must receive at least two layers. I presume that the concatenation cannot see the encoder.

b) Is there a way to turn off teacher forcing from the training procedure? For instance, I want every input of the decoder to be the previous prediction of the decoder. Except for the first input, which I want to be the last timestep of the encoder.

RParedesPalacios commented 3 years ago

Wish you happy easter too.

First of all take into account that main focus of EDDL is not recurrent layers. However we did some important efforts to allow most common topologies but is difficult to foresee all the potential cases.

a) i will check if I can allow to use encoder layers in the decoder side. Please write here the whole topology you can to build.

b) it is not straightforward even I don't know if it is convenient not to use teacher training... in any case I will check as well

Regards

RParedesPalacios commented 3 years ago

Hi,

a) is solved in develop

b) need to check

georgemavrakis-wings commented 3 years ago

Good morning, @RParedesPalacios thank you very much for the effort.

RParedesPalacios commented 3 years ago

Hi, teacher forcing can be disabled now, just using the model (net) flag decoder_teacher_training, for instance in C++:

// Encoder
layer in = Input({1}); //1 word
layer l = in;

layer lE = RandomUniform(Embedding(l, invs, 1,embedding,true),-0.05,0.05); // mask_zeros=true
layer enc = LSTM(lE,128,true);  // mask_zeros=true
layer cps = GetStates(enc);

// Decoder
layer ldin=Input({outvs});
layer ld = ReduceArgMax(ldin,{0});
ld = RandomUniform(Embedding(ld, outvs, 1,embedding),-0.05,0.05);

// input from embedding and
// state from encoder
l = LSTM({ld,cps},128);

layer out = Softmax(Dense(l, outvs));

setDecoder(ldin);

model net = Model({in}, {out});

net->decoder_teacher_training=false; <---- HERE

deephealthproject / eddl

Question about encoder-decoder architecture in the examples #276