allanzelener / YAD2K

YAD2K: Yet Another Darknet 2 Keras
Other
2.71k stars 878 forks source link

Error in model.save('{}'.format(output_path)) #18

Open cWren0110 opened 7 years ago

cWren0110 commented 7 years ago

I am getting the following error when I use yad2k.py on yolov2:

Traceback (most recent call last): File "yad2k.py", line 272, in _main(parser.parse_args()) File "yad2k.py", line 256, in _main model.save('{}'.format(output_path)) File "D:\Users...\Anaconda3\lib\site-packages\keras\engine\topology.py", line 2429, in save save_model(self, filepath, overwrite) File "D:\Users...\Anaconda3\lib\site-packages\keras\models.py", line 101, in save_model 'config': model.get_config() File "D:\Users...\Anaconda3\lib\site-packages\keras\engine\topology.py", line 2246, in get_config layer_config = layer.get_config() File "D:\Users...\Anaconda3\lib\site-packages\keras\layers\core.py", line 668, in get_config function = func_dump(self.function) File "D:\Users...\Anaconda3\lib\site-packages\keras\utils\generic_utils.py", line 177, in func_dump code = marshal.dumps(func.code).decode('raw_unicode_escape') UnicodeDecodeError: 'rawunicodeescape' codec can't decode bytes in position 195-196: truncated \uXXXX Exception ignored in: <bound method BaseSession.del of <tensorflow.python.client.session.Session object at 0x00000198C1BEACF8>> Traceback (most recent call last): File "D:\Users...\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 582, in del UnboundLocalError: local variable 'status' referenced before assignment

I do not get the same error when I run yad2k.py on yolo-tiny.

I changed the code to only save the weights instead via model.save_weights('{}'.format(output_path)) and did not receive any errors, but when I tried to convert the model to json via model.to_json() I received virtually the same error. After doing some digging, people have suggested 2 possible issues:

  1. Invalid layer names
  2. keras has some issues saving models that are created with lambda layers, on some systems

I am guessing you do not see this issue, since it is the example case in the readme, but would you mind checking, and if not posting the versions of the required packages that you are using?

allanzelener commented 7 years ago

I'm not sure but I think I've seen this error before and it was due to cached Python files and using different versions of Python. The issue is related to the fact that Lambda layers just save the wrapped function as it exists in memory when serialized. I would try deleting any *.pyc files in the project folder and see if that helps. I doubt it's invalid layer names since only defaults are used.

The ideal solution is to never serialize the model, just always construct the model programmatically and use save_weights and load_weights for the model state. However this is such an edge case problem that I prefer the convenience of serializing the model, particularly when the original model is coming from Darknet.

The environment I used for running the code is fully specified in the environment.yml.

cWren0110 commented 7 years ago

Interesting... still no luck. Below is what I have tried

Are you running Linux or Mac? My next step is to commandeer a coworkers laptop to see if the OS is the cause, and if not I will reach out to the keras dev team.

allanzelener commented 7 years ago

I'm on Ubuntu 16.04 LTS. Yes, this could easily be a Windows compatibility issue as well and related to how Python runs on Windows.

There's many known issues with Lambda layers in Keras, particularly with serialization. I had to fix one myself to get this working for me. Keras uses the marshall package to serialize the wrapped function.

Two other things that may work here:

  1. Change the Lambda layer to a custom Keras layer. Not sure if this will serialize better but at least it may give a nicer interface.
  2. Submit a feature request to Keras that exposes the Tensorflow function I'm using to the Keras backend and make a PixelShuffle built-in layer for Keras. Need to worry about Theano support in this case.
baloodevil commented 7 years ago

See the 2nd answer in this SO article.