My question is that I want to fine tune the pre-text task of Wav2Vec2.0 base model (not from scratch). Any help would be appreciated?

shakeel608 commented 1 year ago

My question is that I want to fine tune the pre-text task of Wav2Vec2.0 base model (not from scratch). Any help would be appreciated?

cuongducle commented 1 year ago

So, what steps are you facing problems with?

shakeel608 commented 1 year ago

If we use off the shelf Wav2vec2 pre-trained model, it is easier to finetune it on the downstream task. But I want is to fine tune the pre-text task without classification head on another dataset. Imean I want to fine tune base model here.

From the repo, it is showing Train a wav2vec 2.0 base model

This configuration was used for the base model trained on the Librispeech dataset in the wav2vec 2.0 paper

to train from scratch

I didn't find any steps to proceed with on the github repo, how to do it.

Any directions would be appreciated?

cuongducle commented 1 year ago

You can use this for reference: https://github.com/mailong25/self-supervised-speech-recognition. This is a nice wrapper for wav2vec2. I think Fairseq is lack data preparation examples as Espnet does.

facebookresearch / fairseq

My question is that I want to fine tune the pre-text task of Wav2Vec2.0 base model (not from scratch). Any help would be appreciated? #4955

This configuration was used for the base model trained on the Librispeech dataset in the wav2vec 2.0 paper