Teacher forcing per timestep?

IBM / pytorch-seq2seq

An open source framework for seq2seq models in PyTorch.

Apache License 2.0

1.5k stars 376 forks source link

Hi,

I don't understand why the teacher forcing is being done per the whole sequence. The definition of the teacher forcing claims that at each timestep, a predicted or the ground truth token should be fed from the previous timestep. The implementation here, on the other hand, will first make a decision on whether generate the whole sequence with teacher forcing, and then continues decoding with teacher forcing set to True or False (for the whole sequence), which I believe is not correct.

I really appreciate the feedback on this issue, Thanks!

Year, I am dealing with RNN and also found this problem and in pytorch example (https://github.com/pytorch/tutorials/blob/master/intermediate_source/seq2seq_translation_tutorial.py#L558). I think it is a mistake

IBM / pytorch-seq2seq

Teacher forcing per timestep? #195