dvgodoy / PyTorchStepByStep

Official repository of my book: "Deep Learning with PyTorch Step-by-Step: A Beginner's Guide"
https://pytorchstepbystep.com
MIT License
854 stars 321 forks source link

class EncoderDecoder last element of output of decoder does not match #51

Open jdgh000 opened 1 month ago

jdgh000 commented 1 month ago

Here is the partial output of my code:

python3 ch9-p247-encdec.py
printFnc: func:  <function namestr at 0x7f0359f5adc0>
printFnc: func:  <function Linear.__init__ at 0x7f0359f79160>
--------------------------------
['outputs'] :  <class 'torch.Tensor'>
torch.Size([1, 2, 2])
tensor([[[-0.2339,  0.4702],
         **[-0.2634,  0.4622]]], grad_fn=<CopySlices>)**
--------------------------------
--------------------------------
['outputs_from_src_seq'] :  <class 'torch.Tensor'>
torch.Size([1, 2, 2])
tensor([[[-0.2339,  0.4702],
         **[-0.2634,  0.4622]]], grad_fn=<CopySlices>)**

code sources is at: https://gitlab.com/codelabs8265339/codelab-gpu/-/blob/dev-gpu-sbs/ml/tf/tf-from-scratch/3/code-exercises/ch9/ch9-p247-encdec.py?ref_type=heads Basically I copied line by line from around p247.

The previous example using separate encoder and decoder class directly (w/o using encoderDecoder) class, output of the decoder was not matching (although encoder's does) which I fixed (matched output) by setting seed reset at 21 before initialization of both encoder and decoder instantiation:


torch.manual_seed(21)

encoder = Encoder(n_features=2, hidden_dim=2)

hidden_seq = encoder(source_seq) # output is N, L, F
printTensor(hidden_seq, globals(), "full")
hidden_final = hidden_seq[:, -1:]   # takes last hidden state
printTensor(hidden_final, globals(), "full")

torch.manual_seed(21)

decoder = Decoder(n_features=2, hidden_dim=2)
decoder.init_hidden(hidden_seq)

I tried to do same trick before instantiation of encoderDecoder class but this time luck did not hit me:

# create encoder

torch.manual_seed(21)
encoder = Encoder(n_features=2, hidden_dim=2)

# creaet decoder

torch.manual_seed(21)
decoder = Decoder(n_features=2, hidden_dim=2)

torch.manual_seed(21)
encdec=EncoderDecoder(encoder, decoder, input_len=2, target_len=2, teacher_forcing_prob=0.5)
dvgodoy commented 1 month ago

Hi @jdgh000 ,

I believe the reproducibility issue boils down to the teacher_forcing_prob argument. To test the reproducibility of the EncoderDecoder class against its two components, Encoder and Decoder, we have to set it to 1.0 so the next token is always generated based on the previous real token. Otherwise, sometimes the EncoderDecoder class will randomly choose the previously generated token as starting point for the prediction.

Once we set the probability to 1.0, the outputs should match. First, let's use the EncoderDecoder class (notice I changed the seed just to illustrate my point with a different set of outputs):

# create encoder
torch.manual_seed(23)
encoder = Encoder(n_features=2, hidden_dim=2)
# creaet decoder
torch.manual_seed(23)
decoder = Decoder(n_features=2, hidden_dim=2)

encdec = EncoderDecoder(encoder, decoder, input_len=2, target_len=2, teacher_forcing_prob=1.0)
encdec.train() # teacher forcing can only happen in training mode
encdec(full_seq)

The output:

tensor([[[ 0.2269, -0.6615],
         [ 0.2650, -0.6666]]], grad_fn=<CopySlices>)

Now, if we do it manually:

torch.manual_seed(23) # matching seed
encoder = Encoder(n_features=2, hidden_dim=2)
hidden_seq = encoder(source_seq) # output is N, L, F
hidden_final = hidden_seq[:, -1:]   # takes last hidden state
print(hidden_seq)

torch.manual_seed(23)  # matching seed
decoder = Decoder(n_features=2, hidden_dim=2)
decoder.init_hidden(hidden_seq)

# Generation loop
inputs = source_seq[:, -1:]

# HERE: so it's always teacher forcing and we guarantee reproducibility
teacher_forcing_prob = 1.0

target_len = 2
for i in range(target_len):
    print(f'Hidden: {decoder.hidden}')
    out = decoder(inputs)
    print(f'Output: {out}\n')
    # If it is teacher forcing
    if torch.rand(1) <= teacher_forcing_prob:
        # Takes the actual element
        inputs = target_seq[:, i:i+1]
    else:
        # Otherwise uses the last predicted output
        inputs = out

The output is:

tensor([[[ 0.3565, -0.0588],
         [ 0.1863, -0.1898]]], grad_fn=<TransposeBackward1>)
Hidden: tensor([[[ 0.1863, -0.1898]]], grad_fn=<PermuteBackward0>)
Output: tensor([[[ 0.2269, -0.6615]]], grad_fn=<ViewBackward0>)

Hidden: tensor([[[ 0.1462, -0.2929]]], grad_fn=<StackBackward0>)
Output: tensor([[[ 0.2650, -0.6666]]], grad_fn=<ViewBackward0>)

Notice that the two printed outputs (tensor([[[ 0.2269, -0.6615]]], grad_fn=<ViewBackward0>) and tensor([[[ 0.2650, -0.6666]]], grad_fn=<ViewBackward0>)) do match the overall output produced by the encdec object now.

I hope it helps.

Best, Daniel

jdgh000 commented 1 month ago

ok so that teacher forcing prob is set to 1.0? I was not sure about it as i thought that happened before introducing teacher forcing. I may be wrong on this i will try that and see if it works. thx