Closed jfainberg closed 7 years ago
Is it possible for you to run the scripts with Python 3.4 or higher? I never actually tested them on Python 2.7, and this could be a reason. In fact they shouldn't have run until this point successfully.
Thanks for suggesting steps_per_epoch
change. I need to update the code.
Thanks very much for releasing this code!
You're welcome!
Thanks!
I tried using Python 3.4 now. On a smaller dataset it works fine, but once I try on my full set it crashes at the same point. I'm wondering whether it has to do with the few remaining samples? I.e. my trGen.numFeats = 24166681.0, batch size = 256.0. Then with steps_per_epoch=numFeats // batchSize
that is 94401.0. So the remaining feats are only 25. Could the splitDataCounter have been set one too high? I'll do some more experiments... (my maxSplitDataSize = 100)
Can you try ceiling it instead of rounding, as follows:
steps_per_epoch=-(-trGen.numFeats//learning['batchSize'])
I haven't gotten a chance to test my code on Keras 2.0 yet, but fit_generator
expects trGen
to fetch the residual few samples as a minibatch. And trGen
does that. So we have one extra step per epoch, unless numFeats
is a multiple of batchSize
.
Thanks for the suggestion; sadly it didn't work. I've now downgraded to Keras 1.2.2, using Keras-Kaldi untouched, but still I'm getting the same error. But interestingly, now Keras also complained before the crash:
UserWarning: Epoch comprised more than 'samples_per_epoch' samples, which might affect learning results. Set 'samples_per_epoch' correctly to avoid this warning
.
I'm at a loss as to how that could have occurred. I'm realigning everything and running it again.
Sorry, this turned out to be a non-issue. I recreated the alignments and now everything has been running fine over several epochs. It must have been a mismatch between alignments and the data. It might be worth trying to catch this kind of error.
Thanks very much for your help and time!
Good to know that. You're welcome.
At the end of the first epoch it crashed with the following traceback:
Looks like featList or labelList were empty? Not sure why this would happen. Any thoughts?
If relevant, I did have to change the call to fit_generator in train.py from
samples_per_epoch=trGen.numFeats
tosteps_per_epoch=trGen.numFeats//learning['batchSize']
to make it compatible with Keras 2.0.Thanks very much for releasing this code! :-)