Training stops at epoch 1

demolakstate commented 4 years ago

Mine stops at epoch 1. increasing patience does not solve the problem. Any help please? Trace as follows: Epoch 1/50 2020-09-20 16:43:41.588458: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 2020-09-20 16:43:42.717110: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 97/10000 [..............................] - ETA: 1:08:14 - loss: 3.3150 - regression_loss: 2.4165 - classification_loss: 0.8985WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 500000 batches). You may need to use the repeat() function when building your dataset. Running network: 100% (15 of 15) |##########################################################################################| Elapsed Time: 0:00:10 Time: 0:00:10 Parsing annotations: 100% (15 of 15) |######################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00 23 instances of class damaged with average precision: 0.1231 69 instances of class undamaged with average precision: 0.6403 mAP: 0.3817

Epoch 00001: saving model to ./snapshots/resnet50_pascal_01.h5 97/10000 [..............................] - 52s 532ms/step - loss: 3.3150 - regression_loss: 2.4165 - classification_loss: 0.8985 (retinaNet_2) demolakstate@demolakstate:/data/RetinaNet_2/keras-retinanet$

netilovefm1 commented 4 years ago

how did u write command ?

mooratov commented 4 years ago

see issue https://github.com/fizyr/keras-retinanet/issues/1449 The --steps argument must now be exactly equal to total number of training images / batch_size (or set the default to None and let tensorflow figure it out automatically.) Bizarre "feature" imo

minhaz109074 commented 4 years ago

did you solve the problem? I am facing same problem

demolakstate commented 4 years ago

Yes I did. Please check https://youtu.be/9e30kMt_6wU

From: minhaz109074 notifications@github.com Sent: Thursday, November 5, 2020 10:54:36 AM To: fizyr/keras-retinanet keras-retinanet@noreply.github.com Cc: Ademola Okerinde demolaoau@yahoo.com; Author author@noreply.github.com Subject: Re: [fizyr/keras-retinanet] Training stops at epoch 1 (#1462)

did you solve the problem? I am facing same problem

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/fizyr/keras-retinanet/issues/1462#issuecomment-722502835, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJA2TBIAIK5UAYNDZLPCIOLSOLKEZANCNFSM4RT3ZU4A.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale due to the lack of recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

xcsob commented 3 years ago

Having same issue here, training stops after first epoch. How solve this?

mooratov commented 3 years ago

have to set the number of steps to be precisely equal to the number of images in your training set

On Mon, Apr 5, 2021 at 2:33 AM xcsob @.***> wrote:

Having same issue here, training stops after first epoch. How solve this?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fizyr/keras-retinanet/issues/1462#issuecomment-813302703, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJKFAQWMH6FBHEQBM2YFD3THF7VDANCNFSM4RT3ZU4A .

fizyr / keras-retinanet

Training stops at epoch 1 #1462