Open jtkim-kaist opened 4 years ago
Also, your weighted_relative_edit_error includes the information about ground truth, not n-best only.
Sorry for the late.
"It seems that your mwer loss implementation needs prior beam search for inputs for mwer_loss function." --- Yes, I implemented the 2nd approch described in paper: https://arxiv.org/pdf/1712.01818.pdf
There are two possible approximations which ensure tractability: 1, approximating the expectation with samples, or 2, restricting the summation to an N-best list, as is commonly done during sequence training for ASR.
"We can get 'seq_logprobs' during beam search for each hypothesis, however, your implementation seems to re-compute this 'seq_logprobs' using logprob with some tokens found from prior beam search. Is there any reason for this re-computation?"
--- Of course you can calculate seq_logprobs
outside the loss function.
For me, I don't do beam search to get seq_logprobs
outside the loss function, so it is NOT 'RE-compuation' for myself.
Let me mention, that, there are 2 modes of pipelines to apply mWER training:.
seq_probs
.
2). Apply mWER fine-tune training on saved hypotheses. We won't do beam search to refresh top-n hypotheses during mWER training. Hypotheses remains unchanged, and we just adjust their relative weight, i.e., renormalized_seq_probs
.Since beam search is very computational expensive, what I have done in practice is the Offline Mode.
"Also, your weighted_relative_edit_error includes the information about ground truth, not n-best only." -- You can't calculate WER without ground truth. You may argue that we only need the number of word errors instead of the ground truth. As comments said, the ground truth is only used to calculate CE loss.
N: the number of candidate sequences (i.e. hypothesis sequences) plus 1. the last sequence is treated as the ground truth and used to compute ce loss.
It seems that your mwer loss implementation needs prior beam search for inputs for mwer_loss function.
We can get 'seq_logprobs' during beam search for each hypothesis, however, your implementation seems to re-compute this 'seq_logprobs' using logprob with some tokens found from prior beam search. Is there any reason for this re-computation?