flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.35k stars 1.01k forks source link

Training with wav2vec #755

Closed mnpham0417 closed 4 years ago

mnpham0417 commented 4 years ago

Hi,

Are there instructions on how to train AM with wav2vec features as input?

Thank you.

padentomasello commented 4 years ago

Hi @mnpham0417, thanks for question. Wav2vec is trained in pytorch / fairseq (https://github.com/pytorch/fairseq/tree/master/examples/wav2vec) and using wav2vec's features as an input to AM training in wav2letter is not so straightforward and not something we have done before unfortunately.

If you are committed to using wav2vec features within our framework, you could dump all the features to disk and then write a custom dataset to load from these, but it would require a little bit of engineering work to do so.

padentomasello commented 4 years ago

Hi @mnpham0417, will close for now. Please let me know if you have other questions, or if you decide to move forward and would like some guidance.

Juanvulcano commented 3 years ago

@padentomasello I would love to contribute to include this feature. Can we figure out how to proceed with this? It looks like this person attempted to do it and claimed good results. I tried replicating without luck but I'm interested in building speech to text models for low resource languages.

https://github.com/mailong25/vietnamese-speech-recognition

pcbua commented 3 years ago

you could dump all the features to disk

This could be very useful, can't spot necessary functions in API. Can anyone provide an example of a features dump?