Open jdgh000 opened 1 month ago
Hi @jdgh000 ,
I believe the reproducibility issue boils down to the teacher_forcing_prob
argument. To test the reproducibility of the EncoderDecoder
class against its two components, Encoder
and Decoder
, we have to set it to 1.0
so the next token is always generated based on the previous real token. Otherwise, sometimes the EncoderDecoder
class will randomly choose the previously generated token as starting point for the prediction.
Once we set the probability to 1.0
, the outputs should match. First, let's use the EncoderDecoder
class (notice I changed the seed just to illustrate my point with a different set of outputs):
# create encoder
torch.manual_seed(23)
encoder = Encoder(n_features=2, hidden_dim=2)
# creaet decoder
torch.manual_seed(23)
decoder = Decoder(n_features=2, hidden_dim=2)
encdec = EncoderDecoder(encoder, decoder, input_len=2, target_len=2, teacher_forcing_prob=1.0)
encdec.train() # teacher forcing can only happen in training mode
encdec(full_seq)
The output:
tensor([[[ 0.2269, -0.6615],
[ 0.2650, -0.6666]]], grad_fn=<CopySlices>)
Now, if we do it manually:
torch.manual_seed(23) # matching seed
encoder = Encoder(n_features=2, hidden_dim=2)
hidden_seq = encoder(source_seq) # output is N, L, F
hidden_final = hidden_seq[:, -1:] # takes last hidden state
print(hidden_seq)
torch.manual_seed(23) # matching seed
decoder = Decoder(n_features=2, hidden_dim=2)
decoder.init_hidden(hidden_seq)
# Generation loop
inputs = source_seq[:, -1:]
# HERE: so it's always teacher forcing and we guarantee reproducibility
teacher_forcing_prob = 1.0
target_len = 2
for i in range(target_len):
print(f'Hidden: {decoder.hidden}')
out = decoder(inputs)
print(f'Output: {out}\n')
# If it is teacher forcing
if torch.rand(1) <= teacher_forcing_prob:
# Takes the actual element
inputs = target_seq[:, i:i+1]
else:
# Otherwise uses the last predicted output
inputs = out
The output is:
tensor([[[ 0.3565, -0.0588],
[ 0.1863, -0.1898]]], grad_fn=<TransposeBackward1>)
Hidden: tensor([[[ 0.1863, -0.1898]]], grad_fn=<PermuteBackward0>)
Output: tensor([[[ 0.2269, -0.6615]]], grad_fn=<ViewBackward0>)
Hidden: tensor([[[ 0.1462, -0.2929]]], grad_fn=<StackBackward0>)
Output: tensor([[[ 0.2650, -0.6666]]], grad_fn=<ViewBackward0>)
Notice that the two printed outputs (tensor([[[ 0.2269, -0.6615]]], grad_fn=<ViewBackward0>)
and tensor([[[ 0.2650, -0.6666]]], grad_fn=<ViewBackward0>)
) do match the overall output produced by the encdec
object now.
I hope it helps.
Best, Daniel
ok so that teacher forcing prob is set to 1.0? I was not sure about it as i thought that happened before introducing teacher forcing. I may be wrong on this i will try that and see if it works. thx
Here is the partial output of my code:
code sources is at:
https://gitlab.com/codelabs8265339/codelab-gpu/-/blob/dev-gpu-sbs/ml/tf/tf-from-scratch/3/code-exercises/ch9/ch9-p247-encdec.py?ref_type=heads
Basically I copied line by line from around p247.The previous example using separate encoder and decoder class directly (w/o using encoderDecoder) class, output of the decoder was not matching (although encoder's does) which I fixed (matched output) by setting seed reset at 21 before initialization of both encoder and decoder instantiation:
I tried to do same trick before instantiation of encoderDecoder class but this time luck did not hit me: