Open Mddct opened 2 years ago
why mwer use stop gradient? just a regularization?
May be Variance reduction
i find tf ctc beam search will loss the gradients
i find tf ctc beam search will loss the gradients
Beam search is just to find candidate paths, gradient is not required in beam search. Gradients are pushed back to logit
weight since there are probability P
which is computed from logit
as input to MWER loss. NBEST path from CTC Beam search can actually be generated offline to speed up training.
why mwer use stop gradient? just a regularization?