Tensor name not found in checkpoint file - Githubissues

dyelax / Adversarial_Video_Generation

A TensorFlow Implementation of "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun.

MIT License

736 stars 184 forks source link

Tensor name not found in checkpoint file #13

Open newuhe opened 7 years ago

newuhe commented 7 years ago

Hello,I'm trying to use your trained model to predict one frame on your dataset,however I encountered this problem. NotFoundError (see above for traceback): Tensor name "generator/scale_3/setup/Variable_5/optimizer" not found in checkpoint files ../Models/Adversarial/model.ckpt-500000 [[Node: save/RestoreV2_153 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_153/tensor_names, save/RestoreV2_153/shape_and_slices)]] [[Node: save/RestoreV2_63/_147 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_651_save/RestoreV2_63", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

dyelax commented 7 years ago

Hey, sorry for the delay. Which version of TensorFlow are you using? This only works for up to version 0.12 currently.

fverdoja commented 7 years ago

After fixing the functions in my other issue (https://github.com/dyelax/Adversarial_Video_Generation/issues/15), I receive the same output as OP. @dyelax do you think that the problem arise only when using pre-trained model? Do you think I can train the network again and run the code?

fverdoja commented 7 years ago

I've just finished re-training the network on the Ms Pacman dataset and everything seems to work. If you want, I can share the new trained model, which should be compatible with current installations of tensorflow.

dyelax commented 7 years ago

@fverdoja Glad you got it working! Yes, would be great to have your trained model. Does it load with the current loading code in this project?

fverdoja commented 7 years ago

@dyelax The training went well all the way. I just tried to load the model, but sadly it gives the same error as OP. Maybe the way the model is saved is not correct anymore? I can upload the trained model anyway if you want, so you can try to look into the problem maybe a little better than how I could.

dyelax commented 7 years ago

Which version of TensorFlow are you using?

fverdoja commented 7 years ago

1.1.0 if I recall correctly.

dyelax commented 7 years ago

I haven't updated the repo to v1.1.0 yet. This only works for up to v0.12. Does the model loading work in the pull request you made to update the repo to 1.1.0?

fverdoja commented 7 years ago

Nope, with the code in my pull request training works, but loading doesn't.

fverdoja commented 7 years ago

Ok, one of my thesists figured out what the problem was. I, and I imagine OP as well, was loading the model using the following: python avg_runner.py -l ../Save/Models/Default/model.ckpt-1000000.index while the way the model is saved in TF1+, requires you to use the load function without extension... so when calling: python avg_runner.py -l ../Save/Models/Default/model.ckpt-1000000 everything seem to be working.

So I think you can safely merge my pull request. Everything works, just be aware that the model has to be called without extension.

fverdoja commented 7 years ago

Here is a link to the trained model on TF1.1: https://drive.google.com/drive/folders/0B83QXMRRjnSaYzJmQS1TWkZYMkU?usp=sharing

newuhe commented 7 years ago

It's truly the tensorflow version problem,thanks for help.