Open kadir-gunel opened 4 years ago
Hi Kadir,
(m::S2S)(x)
is a forward function with teacher forcing (in which the function knows that ygold=sort(x, dims=2)). This is generally the trick people use in training. Though, it is cheating in the test time. To make the code simpler, I leaved it this way.
I guess, your question is that "then why we don't give ygold as an argument?". It is because I don't want to tell that we're cheating by loud.
I guess in your case the two function that you defined have exactly same signature (m::S2S)(x, ygold)
. So the compiler cannot know the one that you define first. I mean you're overwriting (not overloading) the forward function.
Could you provide me the modified code, so that I can replicate the error.
I don't get segfault in the original example.
Hi Ekin,
Thanks for the feedback.
I was asking myself "what is his objective? Why he has given the ground truths in this way (in the decoder part of the model)? ". Now I understand. Thanks. 😃
The original example works perfectly. But after I changed those 2 forward functions that I mentioned I get segmentation fault :
# Forward functions
(m::S2S)(x, ygold) = m.output(m.decoder(pad(10, ygold) , m.encoder(x; hy=true).hidden).y)
(m::S2S)(x, ygold) = m.loss(m(x, ygold), ygold)
And as you clarified the compiler does not know which function to call thus terminates itself. So I changed the code to :
#new Forward function
(m::S2S)(x, ygold) = m.loss(m.output(m.decoder(pad(10, ygold) , m.encoder(x; hy=true).hidden).y), ygold)
Then I get dimension mismatch error.
And if I change the new forward function to :
#newer Forward function
(m::S2S)(x, ygold) = m.loss(m.output(m.decoder(pad(10, ygold[:, 1:end-1]) , m.encode(x; hy=true).hidden).y), ygold)
Everything seems to work fine.
So all new code is :
using Knet
using KnetLayers
struct S2S; encoder; decoder; output; loss; end
# Initialize model
model = S2S(LSTM(input=11,hidden=128,embed=9),
LSTM(input=11,hidden=128,embed=9),
Multiply(input=128,output=11),
CrossEntropyLoss())
# Forward functions
(m::S2S)(x, ygold) = m.loss(m.output(m.decoder(pad(10, ygold[:, 1:end-1]), m.encoder(x; hy=true).hidden).y), ygold)
(m::S2S)(x) = m.output(m.decoder(pad(10, sort(x,dims=2)), m.encoder(x; hy=true).hidden).y)
predict(m::S2S,x) = getindex.(argmax(Array(m(x)), dims=1)[1,:,:], 1)
# Helper functions for padding
pad(p::Int,x::Array; dims=2) = cat(fill(p,size(x,1)),x;dims=dims)
pad(x::Array,p::Int; dims=2) = cat(x,fill(p,size(x,1));dims=dims)
# Create sorting data: 10 is used as start token and 11 is stop token.
dataxy(x) = (x, pad(sort(x, dims=2), 11))
batchSize, maxLength, dataLength= 64, 15, 2000; # Batch size and maximum sequence length for training
data = [dataxy([rand(1:9) for j=1:batchSize, k=1:rand(1:maxLength)]) for i=1:dataLength]
# Train your model
progress!(adam(model, repeat(data, 10)))
# Test the model
@show predict(model,[3 2 1 4 5 9 3 5 6 6 1 2 5;])
But during test time we have no ground truths, what will happen ?
So, the problem I have with this code is teacher forcing :) . I do not want to use teacher forcing. All I want is, during training, just to give the initial eos tag to the decoder and let the decoder predicts its outputs then at the end of the sequence a loss value is calculated. Simple old school :)
Is there any way to do this?
B.R.
Hello,
I was playing with the seq2seq.jl script. And I wondered why did you use sorting method inside the forward function
instead of
of course I also changed the second forward function to :
And at the end I get segmentation fault. Why is this happening? I am trying to build a very basic sequence to sequence model on different data. Could you help me to resolve this issue please?