domerin0 / rnn-speech

Character level speech recognizer using ctc loss with deep rnns in TensorFlow.
MIT License
77 stars 31 forks source link

Hyperparams examples #12

Closed lingz closed 7 years ago

lingz commented 8 years ago

It would be cool if you provided a baseline hyperparams so we know what reasonable estimates we could start training with.

domerin0 commented 8 years ago

Thats a good suggestion. I'm still conducting the hyper parameter search for the data set. I've had to put off working on this project in favour of work projects, but given the recent interest in it I will find more time to put into it.

I will be able to train some models over the next few days, with the intention of finding an estimation of what parameters converge.

lingz commented 8 years ago

Just out of curiosity, with the models you have so far, what are your CTC loss, and accuracy looking like (just rough estimate)?

domerin0 commented 8 years ago

I haven't spent much time on training it (yet), I had one model where ctc loss went down to about 400, and I haven't built in a mechanism to track accuracy yet. I also expect to be able to get significantly better ctc loss results over the next few days. I'm not sure accuracy is they right metric to use here, I want to do my research into that. I think loss is sufficient given the amount of classification labels.

lingz commented 8 years ago

Hmm, I managed to get CTC loss to under 200, but it was still at that point giving essentially useless labels (almost all blanks).

domerin0 commented 8 years ago

Ok, keep in mind for the acoustic model there probably will be a lot of blanks, and the output might look like gibberish, but the language model should clean that up. Of course, I'm not saying that is the case here. 200 might not be a minimized enough loss, I will hopefully be able to get more information by Monday, by allowing some models to train over the next few days.

lingz commented 8 years ago

Okay let me know if you see any good results with just the acoustic model.

On Sat, Jul 9, 2016, 12:21 AM Dominik notifications@github.com wrote:

Ok, keep in mind for the acoustic model there probably will be a lot of blanks, and the output might look like gibberish, but the language model should clean that up. Of course, I'm not saying that is the case here. 200 might not be a minimized enough loss, I will hopefully be able to get more information by Monday, by allowing some models to train over the next few days.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/inikdom/rnn-speech/issues/12#issuecomment-231475952, or mute the thread https://github.com/notifications/unsubscribe/ADAFzcZ1N5iV0E5-MWQAdGHIzQxpsU-Iks5qTr9mgaJpZM4JHQJ7 .

domerin0 commented 8 years ago

Hey, just leaving an update here. I've been letting models train over night on a Titan X. It's slow going. I will probably find a decent model over the next week or two, the limiting factors being my time and computational complexity. Will provide another update when I can.

AMairesse commented 7 years ago

Hi Ling, There is some pre-trained acoustic models available now and I've put some comments in the config.ini file to give a recommended value for each hyperparameter. Does this answer to your bug report ? May I close it ? Thanks.

lingz commented 7 years ago

Yup that's great thanks.

On Thu, Dec 29, 2016, 9:24 PM Antoine Mairesse notifications@github.com wrote:

Hi Ling, There is some pre-trained acoustic models available now and I've put some comments in the config.ini file to give a recommended value for each hyperparameter. Does this answer to your bug report ? May I close it ? Thanks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/inikdom/rnn-speech/issues/12#issuecomment-269629774, or mute the thread https://github.com/notifications/unsubscribe-auth/ADAFzVsNtiF2Rph93LxPBTHCsT3i2FK7ks5rM7SfgaJpZM4JHQJ7 .