Open osddeitf opened 3 years ago
hi there, i had met the same issue as you mentioned above. I followed the guide from #2922 but the valid wer always remained 90+ and do you have any progress ?
No progress, either. And my work not need this model anymore. Sorry.
I did it using Hugging Face; feel free to try it yourself: https://github.com/huggingface/transformers/pull/10581 https://github.com/elgeish/transformers/blob/e72e6e5a3fe2547432005d2ffe3208f8d84cbe02/examples/research_projects/wav2vec2/run_asr.py
This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!
❓ Questions and Help
I want some help on preparing dataset for finetuning on Wav2Vec2 model.
Long story short
I wanted to do some self-interest tool on recognizing English phonemes, and after a bunch of hours, I found this repo with Wav2Vec2 model, with high empirical result on phonemes recognition in TIMIT dataset.
I finetuned on TIMIT dataset, but when I run inference on my finetuned model, with a single .wav audio in the dataset, what I face, is this:
or in some rarely, that:
or even nothing (empty string) at all.
I've tried some other audios in different sources as well, single channel and 16k bitrate like suggested. But what I expect are at least a long sequence of phonemes, without any thought of accuracy, just for experiment before taking any deeper digging, but like above, it's just WRONG, not INACCURATE.
What I've tried
I tried to finetune Wav2Vec according to https://github.com/pytorch/fairseq/tree/master/examples/wav2vec README. And then after about 100 epoch, I tried to evaluate it.
Then I found this famous issue #2651, and try to used the recognize.py code provide new model. But it's not easy as I thought, my model somehow not compatible.
libri960_big.pt
.examples/wav2vec/config/finetuning/base_1h.yaml
(I change only the checkpoint saving period, though).And another hours passed, finally I barely managed to run my model by modify recognize.py like this:
The reason is that newer training method (use Hydra) not saving anything in
args
, so I need to reconstruct configuration in new propertycfg
of checkpoint. I even had tested on good old base model and print to verify it.Digging further, printing model parameters at runtime, I can ensure that I'm not missing a thing. I could only thought of some incorrect preparation of the finetuning dataset, so I tried a couple of different ways:
attempt_1.ltr
attempt_1.wrd
attempt_1.dict.ltr.txt
attempt_2.ltr
attempt_2.wrd
dict.ltr.txt
...
I combine any different approach I could use, like add
|
in the first line of dict.ltr.txt, or convert words to phonemes, anyway, but unfortunately I just couldn't get it to work.I dig into various Issues on this repo (like #2922), can't help but find things more confused (though I managed to execute scripts well), like I've described above. After, like 1 hundred times try and trying again, cost me a hundred of hours of EC2 in AWS, I couldn't find any better result.
I'm REALLY REALLY appreciate for any help. And if you wish, I willingly buy you some cups of coffee.
What's your environment?