IBM / pytorch-seq2seq

An open source framework for seq2seq models in PyTorch.
https://ibm.github.io/pytorch-seq2seq/public/index.html
Apache License 2.0
1.5k stars 376 forks source link

Teacher forcing per timestep? #195

Open ghost opened 4 years ago

ghost commented 4 years ago

Hi,

I don't understand why the teacher forcing is being done per the whole sequence. The definition of the teacher forcing claims that at each timestep, a predicted or the ground truth token should be fed from the previous timestep. The implementation here, on the other hand, will first make a decision on whether generate the whole sequence with teacher forcing, and then continues decoding with teacher forcing set to True or False (for the whole sequence), which I believe is not correct.

I really appreciate the feedback on this issue, Thanks!

Sushentsev commented 3 years ago

Hi,

I don't understand why the teacher forcing is being done per the whole sequence. The definition of the teacher forcing claims that at each timestep, a predicted or the ground truth token should be fed from the previous timestep. The implementation here, on the other hand, will first make a decision on whether generate the whole sequence with teacher forcing, and then continues decoding with teacher forcing set to True or False (for the whole sequence), which I believe is not correct.

I really appreciate the feedback on this issue, Thanks!

Year, I am dealing with RNN and also found this problem and in pytorch example (https://github.com/pytorch/tutorials/blob/master/intermediate_source/seq2seq_translation_tutorial.py#L558). I think it is a mistake