Open aanchan opened 1 year ago
sentence-level: https://colab.research.google.com/drive/1fWZmiD5_8960gb-7h94B8mxZbPsJ_y08 Before training on Testset: Word Error Rate (WER): 66.26% Character Error Rate (CER): 46.43%
After training: WER: 61.93% CER: 42.05%
word-level: https://colab.research.google.com/drive/12nDcUrxZ76qoV1AhdJvbl-HK7-pXQ2RP before: Word Error Rate (WER): 38.08% Character Error Rate (CER): 27.04%
after training: WER: 18.45% CER: 13.03%
WWW As a reasearcher wanting to understand the impact of synthetic data used for training a spelling corrector, I would like to train a sequence to sequence model based on BART from the artificially generated data from the Tatoeba corpus.
AC Training notebook or script to train a seq2seq model, there is code already in the repository for training seq2seq based on BART for the TORGO transcripts.