Closed sandeep-badrinath closed 4 years ago
i'll update this when we release the code + model, hopefully in the next few weeks
Exciting news! Thanks for the incredible work on this.
Amazing work. :) Thanks a lot.
How many hours had the model (that's going to be released) been trained on?
Is it the same as in the paper (57000 hours)?
Thanks!
libri-light setup is 53k hours after segmenting and filtering out silences we will release all of the models in the paper
I'm interested in trying it too. Any news on when wav2vec 2.0 is available for download? Thanks Franco
Yes, I was hoping for an update as well. Thanks for the amazing work!
Can't wait to try this out!
is there any news on when wav2vec 2.0 is available?
I am willing to use wav2vec 2.0 as soon as prossible!
we are looking to release code + models around the end of july (though no promises)
Thanks for all of your hard work on this. Very exciting news
Thanks a lot. It should be great.
I'm looking forward to this! Thank you.
Awesome work! Can't wait to try this out!
I am also looking forward to try this out! Can't wait to see whether this approach is useful also for tasks in music information retrieval.
This is out now: 621e834103b13318cb48d41fc713b580f0da6b24 https://github.com/pytorch/fairseq/tree/master/examples/wav2vec
Nice work and thank you!
Thank you! I am very excited for this!
Does anyone have any code examples which you can give an audio sample and it outputs text? The example provided here for using pretrained models gives you back a vector: https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md
Does anyone have any code examples which you can give an audio sample and it outputs text? The example provided here for using pretrained models gives you back a vector: https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md
if you download one of the finetuned models and follow the example under "evaluating a ctc model", you should get text as output..
@alexeib Awesome! Thanks! Is there a python script version of this?:
$subset=dev_other
python examples/speech_recognition/infer.py /checkpoint/abaevski/data/speech/libri/10h/wav2vec/raw --task audio_pretraining \
--nbest 1 --path /path/to/model --gen-subset $subset --results-path /path/to/save/results/for/sclite --w2l-decoder kenlm \
--lm-model /path/to/kenlm.bin --lm-weight 2 --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 \
--post-process letter
Thank you very much!
Hi , @youssefavx were you able to get outputs text? If yes can you please give a sample code command for the same. Thanks
Is there a paper for wav2vec 2.0?
https://arxiv.org/abs/2006.11477
On Mon, Aug 31, 2020 at 17:58 Shamoon Siddiqui notifications@github.com wrote:
Is there a paper for wav2vec 2.0?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pytorch/fairseq/issues/2268#issuecomment-684127665, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADMSYLLVS5ZYKML5GFISJTSDRBM3ANCNFSM4OHFBGVQ .
-- Cheers, David E. Weekly (@dweekly)
@alexeib Awesome! Could you share your training splits (e.g., 10min, 1h, 10h), or how to extract them?
you can get the 10h/1h/10m splits from the libri-light datasets: https://github.com/facebookresearch/libri-light
for 10m i just used the first split instead of averaging over the 6 that they have
OK. Thanks for your reply!
@aayagar001 I have not been able to yet unfortunately as I haven't had the time to try this out more carefully once more but I'll try again and will update with code and simple steps for those who like me are mostly lost as to next steps (seems like quite a learning curve for someone who just knows python and dabbled with NLP but I could be wrong). I'll see if I can make a Colab to make 1 gigantic super function you just enter your audio into and get text back.
It's not quite clear to me where the audio file inputs should be placed (a custom one), but I'll try to figure that out.
@aayagar001 It seems people are already trying to figure this out here: https://github.com/pytorch/fairseq/issues/2651
Good to know I'm not the only one who's a bit challenged by the complexity of setting this up. They've already set up a docker image here: https://github.com/loretoparisi/wave2vec-recognize-docker
If you don't know what that is, I don't know either. I think it's something like a virtual environment if you know what that is. If you don't know what that is, it's like instead of installing packages to your computer you install them in a certain folder so that they don't mess with your main installations (when you do pip install)
The steps to running the docker are outlined at the very bottom. First step would be installing docker (google download docker), then run the commands at the bottom of the page.
But as you can see after running the 3 commands, I'm running into this issue here: https://github.com/loretoparisi/wave2vec-recognize-docker/issues/2
Which might either be a mistake on my end (probably) or something else. Would be much more preferable to have a Colab or Kaggle to not have to deal with all the unexpected errors you get on different OS/environments, but this is already a great step forward.
Will keep you guys updated.
I just read the recent paper on wav2vec 2.0, it looks very interesting. Thanks for this contribution. Could someone please let me know when the codes would be released?