Closed hhadian closed 5 years ago
@hhadian:
iter
flag lets you control how many epochs training continues over, but we don't have the ability to condition this on WER (this should be pretty easy to do via a simple modification to Train.cpp
)Thanks for the answers. Do you mean there is no stopping criterion? How can I achieve the reported WER (3.5) on WSJ? I can see the TERs. I'm just not sure what TER means.
@hhadian — TER is the "token error rate" (similar to LER/letter error rate). The acoustic model's emissions are a probability distribution over a set of tokens for a given frame: the AM doesn't emit words.
In order to turn those emissions for each frame into words (and compute WER/word error rate), the w2l decoder combines those emissions with a lexicon and language model scoring and performs a beam search over candidate words for each frame. The decoder will generate final transcripts given the emissions from the acoustic model. Take a look at the docs for more.
OK, cool. I guess you missed my other question. It's a bit strange to not have any stopping criterions. When should the training be stopped?
When should the training be stopped?
There's no right answer to this — this is also an open research question. In general, with a good model, convergence is relatively easy to recognize from a simple dev-LER plot per-epoch; it's somewhat obvious when the model isn't improving further (there will typically be some oscillation around some minima, but LER isn't dropping much over time).
I recently trained a baseline model with the librispeech architecture from the open source tutorial on another dataset, and it produced a curve like this (this is LER on the dev set):
It's not an issue, I just want to ask some questions to make sure all is good. Please let me know if I need to ask this on some other forum.
I am training WSJ using a single Tesla K80 GPU with the default configs. I didn't see an option related to the number of GPUs in the configs.
I also did not see an option to set the number of epochs. So far it has trained for almost 36 hours. Here are the last few lines of
001_perf
:My questions:
Thanks in advance