Open ghost opened 4 years ago
Hi,
I don't understand why the teacher forcing is being done per the whole sequence. The definition of the teacher forcing claims that at each timestep, a predicted or the ground truth token should be fed from the previous timestep. The implementation here, on the other hand, will first make a decision on whether generate the whole sequence with teacher forcing, and then continues decoding with teacher forcing set to True or False (for the whole sequence), which I believe is not correct.
I really appreciate the feedback on this issue, Thanks!
Year, I am dealing with RNN and also found this problem and in pytorch example (https://github.com/pytorch/tutorials/blob/master/intermediate_source/seq2seq_translation_tutorial.py#L558). I think it is a mistake
Hi,
I don't understand why the teacher forcing is being done per the whole sequence. The definition of the teacher forcing claims that at each timestep, a predicted or the ground truth token should be fed from the previous timestep. The implementation here, on the other hand, will first make a decision on whether generate the whole sequence with teacher forcing, and then continues decoding with teacher forcing set to True or False (for the whole sequence), which I believe is not correct.
I really appreciate the feedback on this issue, Thanks!