Data generator error when finishing epoch

jfainberg commented 7 years ago

At the end of the first epoch it crashed with the following traceback:

File "../python2.7/threading.py", line 801 in __bootstrap_inner self.run File ".../threading.py", line 754, in run self.target(*self.args, **self.__kwargs) File "../engine/training.py", line 606 in data_generator_task generator_output = next(self._generator) File "...dataGenerator.py", line 153 in next x, y = self.getNextSplitData() File "..dataGenerator.py", line 135, in getNextSplitData return (numpy.vstack(featList), numpy.vstack(labelList)) File "...numpy/core/shape_base.py", line 234, in vstack return _nx.concatenate([atleast_2d(_m) for _m in tup], 0) ValueError: need at least one array to concatenate

Looks like featList or labelList were empty? Not sure why this would happen. Any thoughts?

If relevant, I did have to change the call to fit_generator in train.py from samples_per_epoch=trGen.numFeats to steps_per_epoch=trGen.numFeats//learning['batchSize'] to make it compatible with Keras 2.0.

Thanks very much for releasing this code! :-)

dspavankumar commented 7 years ago

Is it possible for you to run the scripts with Python 3.4 or higher? I never actually tested them on Python 2.7, and this could be a reason. In fact they shouldn't have run until this point successfully.

Thanks for suggesting steps_per_epoch change. I need to update the code.

Thanks very much for releasing this code!

You're welcome!

jfainberg commented 7 years ago

Thanks!

I tried using Python 3.4 now. On a smaller dataset it works fine, but once I try on my full set it crashes at the same point. I'm wondering whether it has to do with the few remaining samples? I.e. my trGen.numFeats = 24166681.0, batch size = 256.0. Then with steps_per_epoch=numFeats // batchSize that is 94401.0. So the remaining feats are only 25. Could the splitDataCounter have been set one too high? I'll do some more experiments... (my maxSplitDataSize = 100)

dspavankumar commented 7 years ago

Can you try ceiling it instead of rounding, as follows: steps_per_epoch=-(-trGen.numFeats//learning['batchSize'])

I haven't gotten a chance to test my code on Keras 2.0 yet, but fit_generator expects trGen to fetch the residual few samples as a minibatch. And trGen does that. So we have one extra step per epoch, unless numFeats is a multiple of batchSize.

jfainberg commented 7 years ago

Thanks for the suggestion; sadly it didn't work. I've now downgraded to Keras 1.2.2, using Keras-Kaldi untouched, but still I'm getting the same error. But interestingly, now Keras also complained before the crash: UserWarning: Epoch comprised more than 'samples_per_epoch' samples, which might affect learning results. Set 'samples_per_epoch' correctly to avoid this warning.

I'm at a loss as to how that could have occurred. I'm realigning everything and running it again.

jfainberg commented 7 years ago

Sorry, this turned out to be a non-issue. I recreated the alignments and now everything has been running fine over several epochs. It must have been a mismatch between alignments and the data. It might be worth trying to catch this kind of error.

Thanks very much for your help and time!

dspavankumar commented 7 years ago

Good to know that. You're welcome.

dspavankumar / keras-kaldi

Data generator error when finishing epoch #6