AntonMu / TrainYourOwnYOLO

Train a state-of-the-art yolov3 object detector from scratch!
Other
651 stars 415 forks source link

EC2 AWS: Keras: ValueError: Invalid backend. #69

Closed silvestre139 closed 4 years ago

silvestre139 commented 4 years ago

Before filing a report consider this question:

Have you followed the instructions exactly (word by word)?

Once you are familiar with the code, you're welcome to modify it. Please only continue to file a bug report if you encounter an issue with the provided code and after having followed the instructions.

If you have followed the instructions exactly and would still like to file a bug or make a feature requests please follow the steps below.

  1. It must be a bug, a feature request, or a significant problem with the documentation (for small docs fixes please send a PR instead).
  2. The form below must be filled out.

System information

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" v1.15.0-rc3-22-g590d6ee 1.15.0

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug or a feature request.

I first tried to run the pre-trained model, and the training locally in Windows with Linux subsystem, and both worked fine! Awesome job, thank you so much for sharing! The problem happened when I tried to implement the YOLO in AWS inside of an EC2 instance. I followed the instructions step by step, but when I got to the point when I have to download the pre-trained model, Keras failed to load the backend.

Source code / logs

user:~/YOLOV3/TrainYourOwnYOLO/2_Training$ python Download_and_Convert_YOLO_weights.py

99% (2477235 of 2480070) |################################ | Elapsed Time: 0:00:30 ETA: 0:00:00Traceback (most recent call last): File "convert.py", line 14, in from keras import backend as K File "/home/ubuntu/YOLOV3/TrainYourOwnYOLO/env/lib/python3.6/site-packages/keras/init.py", line 3, in from . import utils File "/home/ubuntu/YOLOV3/TrainYourOwnYOLO/env/lib/python3.6/site-packages/keras/utils/init.py", line 6, in from . import conv_utils File "/home/ubuntu/YOLOV3/TrainYourOwnYOLO/env/lib/python3.6/site-packages/keras/utils/conv_utils.py", line 9, in from .. import backend as K File "/home/ubuntu/YOLOV3/TrainYourOwnYOLO/env/lib/python3.6/site-packages/keras/backend/init.py", line 1, in from .load_backend import epsilon File "/home/ubuntu/YOLOV3/TrainYourOwnYOLO/env/lib/python3.6/site-packages/keras/backend/load_backend.py", line 101, in raise ValueError('Invalid backend. Missing required entry : ' + e) ValueError: Invalid backend. Missing required entry : placeholder

AntonMu commented 4 years ago

Are you using a virtual environment? Can you just use the same code you used when running it in the Linux subsystem?

silvestre139 commented 4 years ago

@AntonMu Thank you for replying so fast!

I am using a virtual environment, and yes, I used the same code I used in the Linux subsystem, but still get this error with Keras Backend.

AntonMu commented 4 years ago

Hm I just tried it here on ubunut 18.04 and it worked fine. Maybe start over and try again? Make sure you run all commands from within your virtual env and also to install the requirements.

silvestre139 commented 4 years ago

@AntonMu I will start from scratch again, to see if I missed something on AWS, thank you so much! And again, thank you so much for sharing such a Clean implementation of YOLO.

AntonMu commented 4 years ago

You are welcome. Also, if all fails, you can always do this step on your local machine and then copy the pre-trained weights (file yolo.h5) to AWS.

silvestre139 commented 4 years ago

@AntonMu Thank you for the advice.

I tried with two different Ubuntu enviroments in EC2, for deep learning, the Conda and the Base, both failed with different type of errors.

Conda environment: Failed due to Keras Backend. Base environment: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

For the Base I am trying to figure out the compatibility issues with the CUDA drivers, but as a last resource I will go with the Training from my local machine meanwhile, thank you so much!

AntonMu commented 4 years ago

Hi!

Try the Deep Learning AMI 24.1 as described here:

https://github.com/AntonMu/TrainYourOwnYOLO/blob/master/2_Training/AWS/README.MD

Another option is to go back to a previous commit that used tensorflow==1.14.

Hope it works!

On Thu, Jan 16, 2020 at 07:53 silvestre139 notifications@github.com wrote:

@AntonMu https://github.com/AntonMu Thank you for the advice.

I tried with two different Ubuntu enviroments in EC2, for deep learning, the Conda and the Base, both failed with different type of errors.

Conda environment: Failed due to Keras Backend. Base environment: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

For the Base I am trying to figure out the compatibility issues with the CUDA drivers, but as a last resource I will go with the Training from my local machine meanwhile, thank you so much!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/AntonMu/TrainYourOwnYOLO/issues/69?email_source=notifications&email_token=AEEZVCWGIZXH6NDAG7TH5LDQ6B7F7A5CNFSM4KHKNLGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJERT6I#issuecomment-575216121, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEEZVCX7KPQHX3CDJDVJPD3Q6B7F7ANCNFSM4KHKNLGA .

eliwilner commented 4 years ago

You are welcome. Also, if all fails, you can always do this step on your local machine and then copy the pre-trained weights (file yolo.h5) to AWS.

I had the same problem an tried what you said @AntonMu . However, when I started training, I got the same error again. Very weird since everything worked yesterday. Just to mentioned that I followed all the steps in the README.MD files

I tried it on my machine and everything seems working

silvestre139 commented 4 years ago

The solution for this issue can be found in https://github.com/AntonMu/TrainYourOwnYOLO/issues/71