Closed demolakstate closed 3 years ago
how did u write command ?
see issue https://github.com/fizyr/keras-retinanet/issues/1449
The --steps
argument must now be exactly equal to total number of training images / batch_size (or set the default to None
and let tensorflow figure it out automatically.)
Bizarre "feature" imo
did you solve the problem? I am facing same problem
Yes I did. Please check https://youtu.be/9e30kMt_6wU
Ademola A. OKERINDE | PhD CANDIDATE | GRADUATE TEACHING & RESEARCH ASSISTANT | DEPARTMENT OF COMPUTER SCIENCE, ENGINEERING HALL | KANSAS STATE UNIVERSITY | MANHATTAN, KS 66506 (785) 317- 7746 | OKERINDE@K-STATE.EDU
From: minhaz109074 notifications@github.com Sent: Thursday, November 5, 2020 10:54:36 AM To: fizyr/keras-retinanet keras-retinanet@noreply.github.com Cc: Ademola Okerinde demolaoau@yahoo.com; Author author@noreply.github.com Subject: Re: [fizyr/keras-retinanet] Training stops at epoch 1 (#1462)
did you solve the problem? I am facing same problem
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/fizyr/keras-retinanet/issues/1462#issuecomment-722502835, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJA2TBIAIK5UAYNDZLPCIOLSOLKEZANCNFSM4RT3ZU4A.
This issue has been automatically marked as stale due to the lack of recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Having same issue here, training stops after first epoch. How solve this?
have to set the number of steps to be precisely equal to the number of images in your training set
On Mon, Apr 5, 2021 at 2:33 AM xcsob @.***> wrote:
Having same issue here, training stops after first epoch. How solve this?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fizyr/keras-retinanet/issues/1462#issuecomment-813302703, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJKFAQWMH6FBHEQBM2YFD3THF7VDANCNFSM4RT3ZU4A .
Mine stops at epoch 1. increasing patience does not solve the problem. Any help please? Trace as follows: Epoch 1/50 2020-09-20 16:43:41.588458: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 2020-09-20 16:43:42.717110: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 97/10000 [..............................] - ETA: 1:08:14 - loss: 3.3150 - regression_loss: 2.4165 - classification_loss: 0.8985WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 500000 batches). You may need to use the repeat() function when building your dataset. Running network: 100% (15 of 15) |##########################################################################################| Elapsed Time: 0:00:10 Time: 0:00:10 Parsing annotations: 100% (15 of 15) |######################################################################################| Elapsed Time: 0:00:00 Time: 0:00:00 23 instances of class damaged with average precision: 0.1231 69 instances of class undamaged with average precision: 0.6403 mAP: 0.3817
Epoch 00001: saving model to ./snapshots/resnet50_pascal_01.h5 97/10000 [..............................] - 52s 532ms/step - loss: 3.3150 - regression_loss: 2.4165 - classification_loss: 0.8985 (retinaNet_2) demolakstate@demolakstate:/data/RetinaNet_2/keras-retinanet$