NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 369 forks source link

[Question] Parameter "sequence_length" of tf.nn.ctc_loss. #393

Closed RichardsonLiao closed 5 years ago

RichardsonLiao commented 5 years ago

Hi, According to the code, I found there is a parameter called “pad_to” that make sure the length can be divided by an integer (default is 8). https://github.com/NVIDIA/OpenSeq2Seq/blob/e1f56a5c4acd9a82c2156f1d39eca36d049cc6f1/open_seq2seq/data/speech2text/speech_utils.py#L254-L255

Then I am confused about the calculation of “tf.nn.ctc_loss”. https://github.com/NVIDIA/OpenSeq2Seq/blob/0a872ffdf1519047f42d69661293bc33dba4077c/open_seq2seq/losses/ctc_loss.py#L77-L82

Is the parameter “src_length” the original length of dataset (before padding) or after padding? Thanks for answering!

blisc commented 5 years ago

In the current implementation, it is the length of the spectrogram after the pad_to operation.