[Chapter 6.3] Basic machine-learning approach freezing at end of first epoch

fchollet / deep-learning-with-python-notebooks

Jupyter notebooks for the code samples of the book "Deep Learning with Python"

MIT License

18.17k stars 8.53k forks source link

[Chapter 6.3] Basic machine-learning approach freezing at end of first epoch #128

Open N1ck95 opened 4 years ago

N1ck95 commented 4 years ago

I've checked out the code you propose. However I can't figure out why at the end of the first epoch the training freezes and the notebook keeps running endlessly.

Epoch 1/20 499/500 [============================>.] - ETA: 0s - loss: 53.5993

It's stuck in this position almost from 15 minutes. I've tried to run the code several times both with Tensorflow 1 and 2, however nothing changes.

RoseString commented 4 years ago

I ran into the same issue when using two GPUs. Not sure why, but after using only 1 GPU, it moved forward.

ghost commented 4 years ago

I had the same problem and it was due to the version of the book I have, at least I assume! In the book where val_steps is defined, there is: val_steps = (300000 - 200001 - lookback) which must be: val_steps = (300000 - 200001 - lookback) // batch_size and also the same change for test_steps

when you do not use " // batch_size", val_steps will be much larger and it will take a lot of time to evaluate.