Open weiyengs opened 3 years ago
@weiyengs Don't know whether the considerations on Chinese for DeepSpeech can help you?
wav2vec finetuning is already done on letters by default. you would just use japanese characters as if they were letters/words and measure character error rate instead of word error rate
This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!
How do I adapt wav2vec for languages like Japanese which words do not have spaces between them?
Places that require changes are(that I'm guessing): 1) I have to tokenize the training data and seperate the tokens by space?
Are there any other places that I should take note of? Thanks!