How to pre-train a multilingual HuBERT?

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

MIT License

30.5k stars 6.41k forks source link

How to pre-train a multilingual HuBERT? #5020

Open tarudesu opened 1 year ago

tarudesu commented 1 year ago

@wnhsu I am curious about the way to train a multilingual HuBERT as this. And, can I just continue to pre-train HuBERT on another language by loading the original HuBERT checkpoint and resuming the training phase on the other dataset (another language).

Could someone explain this to me, please? Thank you in advance!

asadullah797 commented 6 months ago

I am also interested to run the same kind of experiments. I would like to continue pre-training wav2vec2.0 model with new language/dataset rather than pre-training from scratch. Please let me know if you find an anwer.

tarudesu commented 6 months ago

I gave up doing this for a long time @asadullah797. But still looking for a simple technical approach too.

asadullah797 commented 6 months ago

I am also looking for similar options. I am searching if someone might have posted this question before. I think I have find something similar here: https://github.com/mailong25/self-supervised-speech-recognition 1.1 and 1.2 does the trick, I am going to give a try.