facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.53k stars 6.41k forks source link

Release of wav2vec 2.0 #2268

Closed sandeep-badrinath closed 4 years ago

sandeep-badrinath commented 4 years ago

I just read the recent paper on wav2vec 2.0, it looks very interesting. Thanks for this contribution. Could someone please let me know when the codes would be released?

alexeib commented 4 years ago

i'll update this when we release the code + model, hopefully in the next few weeks

youssefavx commented 4 years ago

Exciting news! Thanks for the incredible work on this.

shamanez commented 4 years ago

Amazing work. :) Thanks a lot.

brainopia commented 4 years ago

How many hours had the model (that's going to be released) been trained on?

Is it the same as in the paper (57000 hours)?

huseinzol05 commented 4 years ago

Thanks!

alexeib commented 4 years ago

libri-light setup is 53k hours after segmenting and filtering out silences we will release all of the models in the paper

fmana commented 4 years ago

I'm interested in trying it too. Any news on when wav2vec 2.0 is available for download? Thanks Franco

fsanal commented 4 years ago

Yes, I was hoping for an update as well. Thanks for the amazing work!

ThorJonsson commented 4 years ago

Can't wait to try this out!

zyzisyz commented 4 years ago

is there any news on when wav2vec 2.0 is available?

shiyuzh2007 commented 4 years ago

I am willing to use wav2vec 2.0 as soon as prossible!

alexeib commented 4 years ago

we are looking to release code + models around the end of july (though no promises)

mtanana commented 4 years ago

Thanks for all of your hard work on this. Very exciting news

fmana commented 4 years ago

Thanks a lot. It should be great.

kairosdojo commented 4 years ago

I'm looking forward to this! Thank you.

wangyinggo commented 4 years ago

Awesome work! Can't wait to try this out!

kukas commented 4 years ago

I am also looking forward to try this out! Can't wait to see whether this approach is useful also for tasks in music information retrieval.

alexeib commented 4 years ago

This is out now: 621e834103b13318cb48d41fc713b580f0da6b24 https://github.com/pytorch/fairseq/tree/master/examples/wav2vec

mtanana commented 4 years ago

Nice work and thank you!

kukas commented 4 years ago

Thank you! I am very excited for this!

youssefavx commented 4 years ago

Does anyone have any code examples which you can give an audio sample and it outputs text? The example provided here for using pretrained models gives you back a vector: https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md

alexeib commented 4 years ago

Does anyone have any code examples which you can give an audio sample and it outputs text? The example provided here for using pretrained models gives you back a vector: https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md

if you download one of the finetuned models and follow the example under "evaluating a ctc model", you should get text as output..

youssefavx commented 4 years ago

@alexeib Awesome! Thanks! Is there a python script version of this?:

$subset=dev_other
python examples/speech_recognition/infer.py /checkpoint/abaevski/data/speech/libri/10h/wav2vec/raw --task audio_pretraining \
--nbest 1 --path /path/to/model --gen-subset $subset --results-path /path/to/save/results/for/sclite --w2l-decoder kenlm \
--lm-model /path/to/kenlm.bin --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 \
--post-process letter
kairosdojo commented 4 years ago

Thank you very much!

aayagar001 commented 4 years ago

Hi , @youssefavx were you able to get outputs text? If yes can you please give a sample code command for the same. Thanks

shamoons commented 4 years ago

Is there a paper for wav2vec 2.0?

dweekly commented 4 years ago

https://arxiv.org/abs/2006.11477

On Mon, Aug 31, 2020 at 17:58 Shamoon Siddiqui notifications@github.com wrote:

Is there a paper for wav2vec 2.0?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pytorch/fairseq/issues/2268#issuecomment-684127665, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADMSYLLVS5ZYKML5GFISJTSDRBM3ANCNFSM4OHFBGVQ .

-- Cheers, David E. Weekly (@dweekly)

ShigekiKarita commented 4 years ago

@alexeib Awesome! Could you share your training splits (e.g., 10min, 1h, 10h), or how to extract them?

alexeib commented 4 years ago

you can get the 10h/1h/10m splits from the libri-light datasets: https://github.com/facebookresearch/libri-light

for 10m i just used the first split instead of averaging over the 6 that they have

ShigekiKarita commented 4 years ago

OK. Thanks for your reply!

youssefavx commented 4 years ago

@aayagar001 I have not been able to yet unfortunately as I haven't had the time to try this out more carefully once more but I'll try again and will update with code and simple steps for those who like me are mostly lost as to next steps (seems like quite a learning curve for someone who just knows python and dabbled with NLP but I could be wrong). I'll see if I can make a Colab to make 1 gigantic super function you just enter your audio into and get text back.

It's not quite clear to me where the audio file inputs should be placed (a custom one), but I'll try to figure that out.

youssefavx commented 4 years ago

@aayagar001 It seems people are already trying to figure this out here: https://github.com/pytorch/fairseq/issues/2651

Good to know I'm not the only one who's a bit challenged by the complexity of setting this up. They've already set up a docker image here: https://github.com/loretoparisi/wave2vec-recognize-docker

If you don't know what that is, I don't know either. I think it's something like a virtual environment if you know what that is. If you don't know what that is, it's like instead of installing packages to your computer you install them in a certain folder so that they don't mess with your main installations (when you do pip install)

The steps to running the docker are outlined at the very bottom. First step would be installing docker (google download docker), then run the commands at the bottom of the page.

But as you can see after running the 3 commands, I'm running into this issue here: https://github.com/loretoparisi/wave2vec-recognize-docker/issues/2

Which might either be a mistake on my end (probably) or something else. Would be much more preferable to have a Colab or Kaggle to not have to deal with all the unexpected errors you get on different OS/environments, but this is already a great step forward.

Will keep you guys updated.