ekinakyurek / KnetLayers.jl

Useful Layers for Knet
MIT License
21 stars 4 forks source link

Segmentation Fault in Seq2Seq #14

Open kadir-gunel opened 4 years ago

kadir-gunel commented 4 years ago

Hello,

I was playing with the seq2seq.jl script. And I wondered why did you use sorting method inside the forward function

(m::S2S)(x) = m.output(m.decoder(pad(10, sort(x, dims=2) , m.encoder(x; hy=true).hidden).y)

instead of

(m::S2S)(x, ygold) = m.output(m.decoder(pad(10, ygold) , m.encoder(x; hy=true).hidden).y)

of course I also changed the second forward function to :

(m::S2S)(x, ygold) = m.loss(m(x, ygold), ygold)

And at the end I get segmentation fault. Why is this happening? I am trying to build a very basic sequence to sequence model on different data. Could you help me to resolve this issue please?

ekinakyurek commented 4 years ago

Hi Kadir,

(m::S2S)(x) is a forward function with teacher forcing (in which the function knows that ygold=sort(x, dims=2)). This is generally the trick people use in training. Though, it is cheating in the test time. To make the code simpler, I leaved it this way.

I guess, your question is that "then why we don't give ygold as an argument?". It is because I don't want to tell that we're cheating by loud.

I guess in your case the two function that you defined have exactly same signature (m::S2S)(x, ygold). So the compiler cannot know the one that you define first. I mean you're overwriting (not overloading) the forward function.

Could you provide me the modified code, so that I can replicate the error.

I don't get segfault in the original example.

kadir-gunel commented 4 years ago

Hi Ekin,

Thanks for the feedback.

I was asking myself "what is his objective? Why he has given the ground truths in this way (in the decoder part of the model)? ". Now I understand. Thanks. 😃

The original example works perfectly. But after I changed those 2 forward functions that I mentioned I get segmentation fault :


# Forward functions
(m::S2S)(x, ygold) = m.output(m.decoder(pad(10, ygold) , m.encoder(x; hy=true).hidden).y)
(m::S2S)(x, ygold) = m.loss(m(x, ygold), ygold)

And as you clarified the compiler does not know which function to call thus terminates itself. So I changed the code to :

#new Forward function
(m::S2S)(x, ygold) = m.loss(m.output(m.decoder(pad(10, ygold) , m.encoder(x; hy=true).hidden).y), ygold)

Then I get dimension mismatch error.

And if I change the new forward function to :

#newer Forward function
(m::S2S)(x, ygold) = m.loss(m.output(m.decoder(pad(10, ygold[:, 1:end-1]) , m.encode(x; hy=true).hidden).y), ygold)

Everything seems to work fine.

So all new code is :

using Knet
using KnetLayers

struct S2S; encoder; decoder; output; loss; end

# Initialize model
model = S2S(LSTM(input=11,hidden=128,embed=9),
            LSTM(input=11,hidden=128,embed=9),
            Multiply(input=128,output=11),
            CrossEntropyLoss())

# Forward functions
(m::S2S)(x, ygold) = m.loss(m.output(m.decoder(pad(10, ygold[:, 1:end-1]), m.encoder(x; hy=true).hidden).y), ygold)
(m::S2S)(x) = m.output(m.decoder(pad(10, sort(x,dims=2)), m.encoder(x; hy=true).hidden).y)

predict(m::S2S,x) = getindex.(argmax(Array(m(x)), dims=1)[1,:,:], 1)

# Helper functions for padding
pad(p::Int,x::Array; dims=2) = cat(fill(p,size(x,1)),x;dims=dims)
pad(x::Array,p::Int; dims=2) = cat(x,fill(p,size(x,1));dims=dims)

# Create sorting data: 10 is used as start token and 11 is stop token.
dataxy(x) = (x, pad(sort(x, dims=2), 11))
batchSize, maxLength, dataLength= 64, 15, 2000; # Batch size and maximum sequence length for training
data = [dataxy([rand(1:9) for j=1:batchSize, k=1:rand(1:maxLength)]) for i=1:dataLength]

# Train your model
progress!(adam(model, repeat(data, 10)))
# Test the model
@show predict(model,[3 2 1 4 5 9 3 5 6 6 1 2 5;])

But during test time we have no ground truths, what will happen ?

So, the problem I have with this code is teacher forcing :) . I do not want to use teacher forcing. All I want is, during training, just to give the initial eos tag to the decoder and let the decoder predicts its outputs then at the end of the sequence a loss value is calculated. Simple old school :)

Is there any way to do this?

B.R.