Closed andrewboyes closed 3 years ago
After a few epochs it sometimes produces this error:
Epoch 00006: ReduceLROnPlateau reducing learning rate to 1.000000013351432e-11.
200/200 [==============================] - 267s 1s/step - loss: 2.8008 - regression_loss: 1.9139 - classification_loss: 0.8868
2020-10-07 12:23:17.121434: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]
if u got a that error, try at that epoch checkpoint , not that first weight file ,
I have tried with a previous version of fizyr/keras-retinanet, using tensorflow 1.14 and I have the same problem of the low mAP (0.00). I don't get the failed precondition anymore however. What could I try do to increase the mAP? How many epochs would one expect before seeing an improvement in the mAP? I am looking for light objects in a dark trunk. My dataset is roughly 1500 images, with 300 instances of each class/object (3 object classes). Which weight file should one use to initialize the network? During training, my learning rate automatically adjusts to below 1e-20 without any improvements in mAP.
This issue has been automatically marked as stale due to the lack of recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Any one have solved it yet? I am also having this issue and none of the provided solutions works :(
I had same problem, but after I set batch-size option with class count, it solved. (I had 3 classes, so I set batch-size as 3)
I am currently using fizyr/retinanet to train a model that detects 3 classes. When I train the model, I receive precisions of 0.0000 on all my classes. In some rounds of training, I received slightly higher precisions e.g. 0.0007.
I have looked at these threads, but it doesn't seem like their solutions work: https://github.com/fizyr/keras-retinanet/issues/647 and https://github.com/fizyr/keras-retinanet/issues/1351
That is, I added the --image-max-side argument to my training command. I made this 2560 pixels. The images I am working with are 1920X2560 pixels. Training set is 916 images. Validation set is 258 images.
The full command that I use to train the model is:
I have also tried running the above command without initializing the weights to coco. This produces the same result. I have copied the train.py file into my parent directory (and changed imports to absolute path).
I had to include this extra piece of code in train.py so that training did not get stopped by GPU running out of resources:
Here is a sample from my train.csv file:
Here is my classes.csv file:
My installation setup is: Windows 10 Tensflow 2.3.1 CUDA Toolkit 11.0 CuDNN v7.6.3
The precision does not change over multiple epochs. Here is a sample of the output:
If there are any suggestions on what to try to increase my precision / troubleshoot why it isn't finding any objects, please let me know?