Closed picheny-nyu closed 3 years ago
Are you using external LM shallow fusion for decoding? Shallow fusion tends to have such problem. See if the deletion error is reduced without shallow fusion.
Anyways I think the length mismatch between training / decoding is the cause. There are several work in literature trying to mitigate this, e.g.:
https://arxiv.org/pdf/1911.02242.pdf https://arxiv.org/pdf/1910.11455.pdf
Ok will turn it off. Thanks for the pointers. Do you have any plans to implement? Or if you point me to the appropriate modules and give me some high level instructions, maybe I will try myself. I assume the second paper (on streaming RNN-Ts) is less relevant?
In order to implement the first paper, I think you might need to modify espresso/data/asr_dataset.py
or add a new dataset class to chop utterances into overlapping segments, and then modify espresso/speech_recognize.py to merge hyps from all the segments within a long utterance.
Thanks. How about the attention aspects? (forcing monotonic attention).
Maybe you can get some reference from https://github.com/freewym/espresso/tree/master/examples/simultaneous_translation/modules
OK, I turned off shallow fusion but it still stops decoding after about 8-10 seconds for the longer utterances. WER is about 60%. Note the Kaldi decoding with the TDNN Hybrid for this corpus is about 24%. Any other parameters to work with before I have to resort to more extreme measures?
Typical long utterance attention plot, if this suggests something.
OK, I think so there is no obvious way to avoid such issue without specially designed algorithms
I am trying to use espresso to decode the MALACH Corpus. One of the characteristics of MALACH is that the training utterances are all short ( < 8 secs on the whole) but the test data contains a significant number of long utterances ( . 20 seconds). I am observing that on these long utterances it produces decent output for the first 5-6 seconds, deteriorates rapidly thereafter, puts out some repeated words, and then stops decoding resulting in many deletions. This is for a transformer model based on the wsj recipe. MALACH has about 160 hours of training data. I would welcome some suggestions/help here - it almost looks like some parameter setting would fix things.
Thanks Michael