Closed alpoktem closed 4 years ago
I tried the pertained model you can find it here https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/README.md did you try to concatenate the contextual vector with the spectrogram and feed it to the ASR network ? Or did you try replace the spectogram with the context vector Thank you
I also used the pre-trained model and works fine. I plugged the h5 to wav2letter cancelling mfcc's. I am having problems training my own model. It's actually training well with Librispeech data without problem. I suppose I have a problem in my data.
@alpoktem i don't think its problem in your data. have you try using other acoustic models such as :DeepSpeech, KALD and ESPNET ? I think the problem because of the large number of features from the wav2vec model [batch_size, audio frame, 512] to any acoustic model ! please correct me if i'm wrong Thanks
@alpoktem hi! can you provide loss curve of your training?
@alpoktem I met the same error. In my case, the AssertionError is caused by a very short utterance where "steps" variable in wav2vec.py:"steps = min(steps, tsz - self.offset)" is negative and thus "end" is 0. After I filtered out short utterances, the AssertionError disappeared. I think maybe it is a bug of the code.
🐛 Bug
I start the training following wav2vec training guidelines. 84% through the first epoch it quits giving an AssertionError.
To Reproduce
Error output
Environment
pip install --editable .