domerin0 / rnn-speech

Character level speech recognizer using ctc loss with deep rnns in TensorFlow.
MIT License
77 stars 31 forks source link

data feeding is too slow #27

Closed qqueing closed 7 years ago

qqueing commented 7 years ago

I think that there is no another change except TF 1.0. but recent version of this project is too slow than old version TF 0.12~13. Is it only my PC? In my test scenario layer 3 * 1024..

AMairesse commented 7 years ago

Hi, It seems that the end of your message was lost, do you have some numbers on a test scenario ? Also : which branch are you testing ? master or dev ?

dev branch is currently still work in progress, I'm not sure it's in good shape for a test run.

In the master branch the last push was quite a big one. In particular there is now a way to do mini-batchs. That's useful when your graphic card cannot take a large batch : a small batch will give instable learning, large batch are more stable. The mini-batch mechanism allow to process multiple batchs before applying the gradients. Finally in this push there was also a big change in default config values, now using a much larger network be default. If you kept default values then in fact it is quite slower than before, but you should also except a much better result ! :+1:

The data feeding does not seems to be slow, I've added debug messages to see about it in the last master version and it was working quite well I think. Anyway in the dev branch I have changed it a lot to follow the performance guideline of tensorflow. In the next version the queue will be directly linked to the model, no need to transmit the data using the input dict. It should be faster but to be honest I didn't really see a change in my tests.

AMairesse commented 7 years ago

Closing now that inputs are done using queues. Feel free to re-open if you have any performance advice.

jesuistay commented 7 years ago

I think one reason it might feel slow is the fact that the corpora aren't preprocessed, rather the trainer transforms the flac files one by one during training. If I am not mistaken this would mean that this step is repeated for every file each epoch? Looking around on other models here on github this seams to be a common way to speed up training.

AMairesse commented 7 years ago

That's right but there is a threading input so the preprocessing of the files is done asynchronously. You only have the time from the first batch before the training start, on the next batches the CPU have the time to process the files while the GPU is doing the training. Of course if you have a slow CPU or no GPU it will be very very slow...

Since my answer in august I've switched to dataset instead of queues. It's a little less efficient : 10% longer on a step with datasets instead of queues. That's because the dataset currently work only on the computer memory, so there is latency when moving the data from computer memory to the gpu memory.

But there is some benefits of course :