fizyr / keras-retinanet

Keras implementation of RetinaNet object detection.
Apache License 2.0
4.38k stars 1.96k forks source link

WARNING:tensorflow:Your input ran out of data; interrupting training. #1449

Closed norbertsk9 closed 3 years ago

norbertsk9 commented 4 years ago

Hello, During training with Google Colab train.py such error occured: WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset. Thanks for your help.

micocw commented 4 years ago

Hi, I have the same error. Some time ago, I used script from here: https://github.com/curiousily/Deep-Learning-For-Hackers/blob/master/15.object-detection.ipynb

Now, when I run it, same as yours error appears. I can see that most of the packages are upgraded. I am trying to downgrade some components, but so far no success.

BS-98 commented 4 years ago

I have the same problem.

medic873 commented 4 years ago

I am having the same problem. Has anyone figured out what might be causing this?

BS-98 commented 4 years ago

No. Problem still exists. I am using tensorflow==2.3.0 and keras==2.4.3.

medic873 commented 4 years ago

I am using Keras 2.4.3 and Tensorflow 2.3.0 as well....

But I have noticed that I have a paperspace gpu server that runs this code just fine. It is running Tensorflow 1.14.0 and keras version 2.3.1

I have tried downgrading my tensorflow to 1.14.0 and keras to 2.3.1 but I get a different set of errors then. I will post what they are here in a few minutes. Once I recreate it again lol.

medic873 commented 4 years ago

So I just did the following

pip uninstall keras-resnet pip uninstall keras-retinanet pip uninstall Keras-Preprocessing pip uninstall Keras-Applications pip uninstall tensorflow pip uninstall tensorflow-gpu

Then pip install tensorflow==1.14.0 pip install tensorflow-gpu==1.14.0

Then I reran this code pip install numpy --user pip install . --user python setup.py build_ext --inplace

And reran my model. I got an error saying keras retinanet requires at least tensorflow 2.2 witch shocks me since I have it running on a paperspace gpu server with tensorflow 1.14.0

But anyways I then did pip uninstall tensorflow and pip uninstall tensorflow-gpu and install tensorflow==2.2 and pip install tensorflow-gpu==2.2

I then tried to run the model again and got this new error "UboundLocalError: local variable 'retval_' reference before assignment"

After that, I uninstalled tensorflow and tensorflow gpu again and install tensorflow 2.3.0 again and am still getting the error.

"WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset"

So I am kind of at a loss. Just not sure what to try next :|

norbertsk9 commented 4 years ago

I also tried to downgrade tensorflow and keras, but it doesnt give any effect.

medic873 commented 4 years ago

So I just tried creating a new conda environment and using the pip list from my paperspace GPU server.

Here is the pip list from that sever. PYTHON VERSION 3.7.5 Package Version


absl-py 0.7.1 apturl 0.5.2 asn1crypto 0.24.0 astor 0.8.0 attrs 19.1.0 Automat 0.6.0 backcall 0.1.0 bleach 3.1.0 blinker 1.4 Brlapi 0.6.6 certifi 2018.1.18 chardet 3.0.4 click 6.7 cloud-init 19.1 colorama 0.3.7 command-not-found 0.3 configobj 5.0.6 constantly 15.1.0 cryptography 2.1.4 cupshelpers 1.0 cycler 0.10.0 Cython 0.29.21 decorator 4.4.0 defer 1.0.6 defusedxml 0.6.0 distro-info 0.18ubuntu0.18.04.1 entrypoints 0.3 gast 0.2.2 google-pasta 0.1.7 grpcio 1.22.0 h5py 2.9.0 html5lib 0.999999999 httplib2 0.9.2 hyperlink 17.3.1 idna 2.6 incremental 16.10.1 ipykernel 5.1.1 ipython 7.6.1 ipython-genutils 0.2.0 ipywidgets 7.5.0 jedi 0.14.1 Jinja2 2.10.1 joblib 0.13.2 jsonpatch 1.16 jsonpointer 1.10 jsonschema 3.0.1 jupyter 1.0.0 jupyter-client 5.3.1 jupyter-console 6.0.0 jupyter-core 4.5.0 Keras 2.3.1 Keras-Applications 1.0.8 Keras-Preprocessing 1.1.0 keras-resnet 0.1.0 keras-retinanet 0.5.1 keyring 10.6.0 keyrings.alt 3.0 kiwisolver 1.1.0 language-selector 0.1 launchpadlib 1.10.6 lazr.restfulclient 0.13.5 lazr.uri 1.0.3 linecache2 1.0.0 louis 3.5.0 lxml 4.5.2 macaroonbakery 1.1.3 Mako 1.0.7 Markdown 3.1.1 MarkupSafe 1.1.1 matplotlib 3.1.1 mistune 0.8.4 nbconvert 5.5.0 nbformat 4.4.0 netifaces 0.10.4 notebook 6.0.0 numpy 1.16.4 oauth 1.0.1 oauthlib 2.0.6 olefile 0.45.1 opencv-python 4.1.0.25 PAM 0.4.2 pandas 0.25.0 pandocfilters 1.4.2 parso 0.5.1 pbr 3.1.1 pexpect 4.7.0 pickleshare 0.7.5 Pillow 6.1.0 pip 20.2 progressbar 2.5 progressbar2 3.51.4 prometheus-client 0.7.1 prompt-toolkit 2.0.9 protobuf 3.9.0 ptyprocess 0.6.0 pyasn1 0.4.2 pyasn1-modules 0.2.1 pycairo 1.16.2 pycrypto 2.6.1 pycups 1.9.73 Pygments 2.4.2 pygobject 3.26.1 PyJWT 1.5.3 pymacaroons 0.13.0 PyNaCl 1.1.2 pyOpenSSL 17.5.0 pyparsing 2.4.1.1 PyQt5 5.10.1 pyRFC3339 1.0 pyrsistent 0.15.3 pyserial 3.4 python-apt 1.6.5+ubuntu0.2 python-dateutil 2.8.0 python-debian 0.1.32 python-utils 2.4.0 pytz 2019.1 pyxdg 0.25 PyYAML 5.1.1 pyzmq 18.0.2 qtconsole 4.5.2 reportlab 3.4.0 requests 2.18.4 requests-unixsocket 0.1.5 scikit-learn 0.21.2 scipy 1.3.0 screen-resolution-extra 0.0.0 SecretStorage 2.3.1 Send2Trash 1.5.0 service-identity 16.0.0 setuptools 41.0.1 simplegeneric 0.8.1 simplejson 3.13.2 sip 4.19.8 six 1.12.0 ssh-import-id 5.7 system-service 0.3 systemd-python 234 tensorboard 1.14.0 tensorflow 1.14.0 tensorflow-estimator 1.14.0 tensorflow-gpu 1.14.0 termcolor 1.1.0 terminado 0.8.2 testpath 0.4.2 testresources 2.0.0 Theano 1.0.4 torch 1.1.0 torchvision 0.3.0 tornado 6.0.3 traceback2 1.4.0 traitlets 4.3.2 Twisted 17.9.0 ubuntu-drivers-common 0.0.0 ufw 0.36 unattended-upgrades 0.1 unittest2 1.1.0 urllib3 1.22 usb-creator 0.3.3 virtualenv 15.1.0 wadllib 1.3.2 wcwidth 0.1.7 webencodings 0.5.1 Werkzeug 0.15.5 wheel 0.33.4 widgetsnbextension 3.5.0 wrapt 1.11.2 xkit 0.0.0 zope.interface 4.3.2

I created a new conda enviorment install numpy version 1.16.4 then install tensorflow version 1.14.0 and tensorflow gpu version 1.14.0 and I think had to do a pip isntall keras-retinanet

but after that, I reran my code and it did not give me the error saying that I need to figure out how to make it repeat.

But now tensorflow is not utilizing my gpu's :( so kindof defeats the purpose lol

norbertsk9 commented 4 years ago

Has someone solved the issue?

ha-nso-li commented 4 years ago

I found two workarounds.

  1. Use --steps argument while train. Steps should be smaller than or equal to length of your dataset / batch size. For example: Your dataset has 1000 images and batch size is 1 --steps 1000 Your dataset has 1000 images and batch size is 2 --steps 500

  2. Change default value of steps to None and do not use --steps argument while train. https://github.com/fizyr/keras-retinanet/blob/8536cab6baafa8ae3beaa4f62e01cbad872e9884/keras_retinanet/bin/train.py#L436 Tesnorflow(keras) will calculate proper step automatically.

mooratov commented 4 years ago

the solution proposed by @hansoli68 works, but note that for the first version, --steps must be equal to the total number of unique images in your training set (divided by batch size) - I was making the mistake of using the total number of training labels, but some images have more than one training label, and my run failed until I determined the number of unique images.

If the second approach of setting step=None works, that seems more foolproof. In fact, as currently constructed, it apparently doesn't even make sense to have steps be a mutable parameter?

Andre-Vitorino commented 4 years ago

if you using ImageDataGenerator function, try to change the batch_size method inside of flow_from_directory function, like this:

instantiating and setting up ImageDataGenerator:

training_generator = ImageDataGenerator(rescale=1./255, rotation_range=7, horizontal_flip=True, shear_range=0.2, height_shift_range=0.07, zoom_range=0.2)

test_generator = ImageDataGenerator(rescale=1./255)

setting up training and test database: here inside "flow_from_directory" set the "batch_size" to 1 if you wanna to use all files in your training and test database

training_base= training_generator.flow_from_directory('path_to_directory' , target_size=(100,100),batch_size=1, class_mode='binary')

test_base = test_generator.flow_from_directory('path_to_directory', target_size=(100, 100),batch_size=1, class_mode='binary')

after that, set the value for "steps_per_epoch" using the total number of files in your training database divided for batch_size set value, in this case 1. You need to do the same thing in "validation_steps", but instead use total value of training set, divided by total value of test database

classifier.fit_generator(training_base,steps_per_epoch=5216/1, epochs=5, validation_data= test_base, validation_steps=624/1)

hope it helps you. sorry for my english.

emoen commented 4 years ago

I have done as @Andre-Vitorino suggest. Then I dont get that error, but it looks like the training works only because in each epoch the network looks at the same original images, because the validation accuracy doesnt change regardless of what learning rate is set. So it makes training work but doesnt solve the underlying problem - that the images are not augmented.

hashimi1998 commented 4 years ago

if you using ImageDataGenerator function, try to change the batch_size method inside of flow_from_directory function, like this:

instantiating and setting up ImageDataGenerator:

training_generator = ImageDataGenerator(rescale=1./255, rotation_range=7, horizontal_flip=True, shear_range=0.2, height_shift_range=0.07, zoom_range=0.2)

test_generator = ImageDataGenerator(rescale=1./255)

setting up training and test database: here inside "flow_from_directory" set the "batch_size" to 1 if you wanna to use all files in your training and test database

training_base= training_generator.flow_from_directory('path_to_directory' , target_size=(100,100),batch_size=1, class_mode='binary')

test_base = test_generator.flow_from_directory('path_to_directory', target_size=(100, 100),batch_size=1, class_mode='binary')

after that, set the value for "steps_per_epoch" using the total number of files in your training database divided for batch_size set value, in this case 1. You need to do the same thing in "validation_steps", but instead use total value of training set, divided by total value of test database

classifier.fit_generator(training_base,steps_per_epoch=5216/1, epochs=5, validation_data= test_base, validation_steps=624/1)

hope it helps you. sorry for my english.

i also used your style and it solve the problem however the accuracy show the same value

hrithiksagar commented 4 years ago

Hello, During training with Google Colab train.py such error occured: WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset. Thanks for your help.

it has worked for me when I changed the number in steps per epoch where it was showing me error after step 2429 as my dataset has overall images of 2430 I changed steps_per_epoch = 2429 then it started running without any error

hashimi1998 commented 4 years ago

Thank you

On Tue, 3 Nov 2020, 10:41 pm Hrithik Sagar, notifications@github.com wrote:

Hello, During training with Google Colab train.py such error occured: WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch

  • epochs batches (in this case, 5000 batches). You may need to use the repeat() function when building your dataset. Thanks for your help.

it has worked for me when I changed the number in steps per epoch where it was showing me error after step 2429 as my dataset has overall images of 2430 I changed steps_per_epoch = 2429 then it started running without any error

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fizyr/keras-retinanet/issues/1449#issuecomment-721156730, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP7NQ6L3LOY4Y5QHEAICHVDSOAJAZANCNFSM4QPYZGXA .

stale[bot] commented 3 years ago

This issue has been automatically marked as stale due to the lack of recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

JongJianYi commented 3 years ago

The code i run: new_model.fit_generator(train_generator,validation_data=(x_valid,y_valid),steps_per_epoch=len(x_train),epochs=2)

The error i got: *WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch epochs` batches (in this case, 16928 batches). You may need to use the repeat() function when building your dataset**

Anyone know how to solve it?

Mohit-robo commented 3 years ago

The code i run: new_model.fit_generator(train_generator,validation_data=(x_valid,y_valid),steps_per_epoch=len(x_train),epochs=2)

The error i got: *WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch epochs` batches (in this case, 16928 batches). You may need to use the repeat() function when building your dataset**

Anyone know how to solve it?

Try reducing the steps_per_epoch value below the value you have currently set. This helped me solve the problem

dennymarcels commented 2 years ago

I am having the same issue, but curiously, it works only with my validation generator. It does produce 40 batches in the case, as I could attest producing data in a for loop, but when used by the fit method, it says that the generator should be able to produce 40 batches but doesn't, then validation is skipped. Still weirder, setting validation_steps to 39 does not help, but 38 does. I use the very same generator with my training data and it is working fine.

MohamedEZ-zaalyouy commented 2 years ago

if you using ImageDataGenerator function, try to change the batch_size method inside of flow_from_directory function, like this:

instantiating and setting up ImageDataGenerator:

training_generator = ImageDataGenerator(rescale=1./255, rotation_range=7, horizontal_flip=True, shear_range=0.2, height_shift_range=0.07, zoom_range=0.2)

test_generator = ImageDataGenerator(rescale=1./255)

setting up training and test database: here inside "flow_from_directory" set the "batch_size" to 1 if you wanna to use all files in your training and test database

training_base= training_generator.flow_from_directory('path_to_directory' , target_size=(100,100),batch_size=1, class_mode='binary')

test_base = test_generator.flow_from_directory('path_to_directory', target_size=(100, 100),batch_size=1, class_mode='binary')

after that, set the value for "steps_per_epoch" using the total number of files in your training database divided for batch_size set value, in this case 1. You need to do the same thing in "validation_steps", but instead use total value of training set, divided by total value of test database

classifier.fit_generator(training_base,steps_per_epoch=5216/1, epochs=5, validation_data= test_base, validation_steps=624/1)

hope it helps you. sorry for my english.


هذا الحل يعمل معي . شكرا لك

This solution works for me. Thank you

Cette solution fonctionne pour moi. Merci

javedsidq commented 2 years ago

Hi, I was also facing the same issue but I got it resolved by only making a bit modification in the code. I passed the perimeter 'steps_per_epoch' as an integer instead of floating point and the issue got resolved.

robbani2210 commented 1 week ago

UserWarning: Your input ran out of data; interrupting training. https://github.com/okbabent/-fev22-ocular-disease/issues/1

hi, can u help me solve error error always occurs in even epochs