why mwer use stop gradient?

TeaPoly / CTC-OptimizedLoss

Computes the MWER (minimum WER) Loss with CTC beam search. Knowledge distillation for CTC loss.

57 stars 10 forks source link

why mwer use stop gradient? #2

Open Mddct opened 2 years ago

Mddct commented 2 years ago

why mwer use stop gradient? just a regularization?

Mddct commented 2 years ago

why mwer use stop gradient? just a regularization?

May be Variance reduction

leixiaoning commented 2 years ago

i find tf ctc beam search will loss the gradients

TeaPoly commented 1 year ago

i find tf ctc beam search will loss the gradients

Beam search is just to find candidate paths, gradient is not required in beam search. Gradients are pushed back to logit weight since there are probability P which is computed from logit as input to MWER loss. NBEST path from CTC Beam search can actually be generated offline to speed up training.