Closed anhvth closed 2 years ago
Hi @anhvth, thanks for your feedback. The teacher predicts representations from the masked indices in the input (the indices that are masked for src are not masked for trg and vice versa) so the mask must be the inverse of the one in the student.
In the teacher forward pass the
mask_time_indices
is the inverse of the one in student, is this correct? I think themask
in the teacher forward pass should be None since the teacher expects the full version of input data