Rudrabha / Lip2Wav

This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
MIT License
692 stars 152 forks source link

Teacher forcing on TIMIT and GRID dataset #29

Open hjzzju opened 3 years ago

hjzzju commented 3 years ago

Hi, I want to know how to set teacher forcing in GRID and TCDTIMIT dataset. The same as lip2wav dataset? teacher forcing decay from 29000 steps?

Rudrabha commented 3 years ago

You can decay it earlier. Start from 1000 steps or something similar. You may decay within 10,000 steps and then let it train without teacher forcing for sometime.

Domhnall-Liopa commented 2 years ago

Hi,

With tacotron_teacher_forcing_mode="constant" during training, the teacher forcing ratio is never decayed and always stays at 1. Then in synthesizer/models/helpers.py the following code is used to select the groundtruth or output of the previous time-step:

next_inputs = tf.cond(
      tf.less(tf.random_uniform([], minval=0, maxval=1, dtype=tf.float32), self._ratio),
      lambda: self._targets[:, time, :],
      lambda: outputs[:,-self._output_dim:])

Since the ratio is always 1 and never decayed, the decoder is just passed in the groundtruth of the previous time-step for the entire training. Is this expected? Should there be a switch at some point so the outputs of the previous time-step are passed during training?