georgesterpu / avsr-tf1

Audio-Visual Speech Recognition using Sequence to Sequence Models
GNU General Public License v3.0
81 stars 28 forks source link

how to pad features and labels to same length in one batch ? #7

Closed kobenaxie closed 5 years ago

kobenaxie commented 5 years ago

Hi, I can not find the code for padding, could you point out please ? And what is the meaning of [MASK, END] in unit dict, where used it

georgesterpu commented 5 years ago

We rely on the tf.data.Dataset.padded_batch method to do the paddings.

padded_batch is used here, in the function that creates the Dataset and the Iterator

MASK and END are special tokens that I occasionally found in the predicted ids. I cannot remember exactly why they appeared there, but my best guess is the beam search decoder. We kept them in the unit dictionary just for convenience, so that the reverse lookup to graphemes would be error free. In any case, these symbols are removed here before the prediction is written to a file.