I trained model with ta use transformer on aishell1 with encoder left window 15, right window 15, decoder window left 15, right 2. I got better acc on train data. But when decode in prefix_recognize, the wer is 9.3 on test set! It was worse than chunk32 with wer 6.3
But compare chunk and Ta training log, the acc in ta was better than chunk. So I doubt the algorithm wrong. By removing the hat_att, which acting like cache, ta got wer 6.5 when ctc_weight 0.5, and terrible Rtf, maybe 8-10.
Could you modify the algorithm to fix ta decoding with better wer and rtf ?
I trained model with ta use transformer on aishell1 with encoder left window 15, right window 15, decoder window left 15, right 2. I got better acc on train data. But when decode in prefix_recognize, the wer is 9.3 on test set! It was worse than chunk32 with wer 6.3 But compare chunk and Ta training log, the acc in ta was better than chunk. So I doubt the algorithm wrong. By removing the hat_att, which acting like cache, ta got wer 6.5 when ctc_weight 0.5, and terrible Rtf, maybe 8-10.
Could you modify the algorithm to fix ta decoding with better wer and rtf ?