domerin0 / rnn-speech

Character level speech recognizer using ctc loss with deep rnns in TensorFlow.
MIT License
77 stars 31 forks source link

Added one file processing from command line #16

Closed AMairesse closed 8 years ago

AMairesse commented 8 years ago

Hi, This is the last part and we will be up to date. I've exported checkpoint_dir and training_dataset_dir in the config file, make a new launcher to process one file from command line (TODO : quite some duplicate code with train.py which I should fix, documentation should also explain how to use). There's also added information in the config file about max_input_seq_length and max_target_seq_length. And finally I removed unused code.

Not related to this pull request but I think your initial idea of saving the vector file for each wav file was good. On my computer it take 8 to 10 seconds waiting for the process to feed input data, at each step ! I will probably have a look into this.

domerin0 commented 8 years ago

Yes, thats a good point, I also noticed that. I found when I saved the vector files that it quickly used all my memory. A quick calculation:

(3263 * 123 + 517 )* 4 bytes = ~1.66mb. 1.66mb * (9000 -data points) = ~15gb

I realize 15gb isn't so bad when considering parts of it could be loaded from disk into memory as needed, but I'd like to reduce the memory foot print. Maybe switching to 16 bit precision, and reducing sequence length would work (I think the 3263 can be reduced a lot without losing many data points). I can run some experiments to optimise this a bit.

AMairesse commented 8 years ago

I am currently running with 800 for sequence length and it works great. That's a 20 second max wav file, almost all files seems to be less than that in the dataset

domerin0 commented 8 years ago

Just noticed that when I fetched your pr. Way ahead of me.

AMairesse commented 8 years ago

I don't have enough memory for 3263 :-)

AMairesse commented 8 years ago

About saving the vectors files I think it should be interesting at least for the test set because we run it often.

domerin0 commented 8 years ago

Yes, that's a good point. Could certainly save a lot of time. Also, you're right, 800 seems to work well, it's much faster.

AMairesse commented 8 years ago

I've just seen my mistake : 800 is not for 20 seconds, it's for 8 seconds. I took 0.025s but that's the window length, the window step is 0.01s. I have tried a local update to reject the files which are too long and I get a lot of it with 800. I'm sure cutting is a bad idea so I will have to find a way to go over 800 on my computer... or find another dataset.