Open buriburisuri opened 7 years ago
I think data augmentation is important. Like increase the speech frequency, add background noise, alter the pitch etc...
Or maybe the database could be expanded for robust online usage. When I test the modle with LibriSpeech datasets, the results is not quite well. I was wondering if it is possible to add "LibriSpeech train-other-500" datasets as training data and maybe the performance will be improved.
I agree with that data augmentation is more important.
@a00achild1 could you share the configs and results in LibriSpeech dataset? I am going to do some tests on this dataset and check if the model will work better as the data size is larger.
I would love to see a version of this repo that will allow me to use it for discovering and generating the characteristic in speech or music samples, similar to what's supported in the original WaveNet. This is because while the fact that the original WaveNet is able to learn from raw waveform is really cool, but I find the MFCC approach adopted in this repo to be much more practical.
I also agree with that data augmentation important but for testing a docker would a great addition.
Hello,
I tested the software, trained during a weekend and runs.
From all the data augmentation techniques for wave speech, which one do you think that would be better to experiment with first?
I understand from here http://speak.clsp.jhu.edu/uploads/publications/papers/1050_pdf.pdf that Speed perturbation might be something worth to try, what do you think?
@migvel Thanks for your information.
I think that DeepSpeech(https://arxiv.org/abs/1412.5567) will be a good start point to the augmentation. ( see Section 4 in the paper ) In addition, I plan to apply pitch and speed variation. I think it'll be tough works T.T
Thank you
I trained for 20 epochs. The suggested test of recognizing the training data worked great. I tried a wav file not from the test data, the result was "een ererdi" which is not even close to what was said in the wav file. I would suggest you split your data into training and test. Update the code to train on the training set and then run the test on data that the neural net was not trained on to see the effectiveness.
@LCLL sorry for late reply. I used Sox to augment LibriSpeech dataset and used another project to train with it but couldn't get any good result until now, the loss always diverge after few iterations
And I havn't try LibriSpeech dataset on this project yet.
@a00achild1 I've just downloaded LibriSpeech dataset. I'll try VCTK + Libri + Augment, for generalization.
@buriburisuri Great! I'm going to try combining Libri, too.
Hello everyone. I am currently doing my GSoC project on speech recognition, based on Deep Speech. I wanted to ask you about data augmentation. What approach did u finally adopt? Was it the one given in paper itself or something else? Thank you
@buriburisuri A language model would definitely be a good feature, so it could probably output correctly punctuated and cased text, aside from possibly increasing inferencing accuracy. :)
Does this implementation use the idea of "fast wavenet" you mention ? Basically they cache previous value not to compute them again. https://github.com/tomlepaine/fast-wavenet
If not, it would probably be a great feature...
I think to add this features now.
1) Docker images
2) Data augmenting
3) Quantative analysis
Please, reply features you think important !!!