CQFIO / PhotographicImageSynthesis

Photographic Image Synthesis with Cascaded Refinement Networks
https://cqf.io/ImageSynthesis/
1.25k stars 226 forks source link

Error in saver.restore in demo_512p and demo_1024p? #2

Closed Quasimondo closed 7 years ago

Quasimondo commented 7 years ago

It seems like continuing a training session of demo_512 and demo_1024 does not work since after restoring a previously trained model it gets immediately overwritten by a blank Saver. I think that last line should be moved before the ckpt check, like in demo_256:

ckpt=tf.train.get_checkpoint_state("result_512p")
if ckpt:
    print('loaded '+ckpt.model_checkpoint_path)
    saver=tf.train.Saver(var_list=[var for var in tf.trainable_variables() if var.name.startswith('g_')])
    saver.restore(sess,ckpt.model_checkpoint_path)
else:
    ckpt_prev=tf.train.get_checkpoint_state("result_256p")
    saver=tf.train.Saver(var_list=[var for var in tf.trainable_variables() if var.name.startswith('g_') and not var.name.startswith('g_512')])
    print('loaded '+ckpt_prev.model_checkpoint_path)
    saver.restore(sess,ckpt_prev.model_checkpoint_path)
saver=tf.train.Saver(max_to_keep=1000)
CQFIO commented 7 years ago

saver=tf.train.Saver(max_to_keep=1000) assumes the variable list to be all the variables. It is not a blank Saver. I double check the code by running it again.

Quasimondo commented 7 years ago

Unfortunately I am not too experienced with the way tensorflow handles checkpoints, but it looks to me like there is something strange happening in demo512 (vs. demo256 which works as expected): When I try to continue training from a result_512p checkpoint of a model that has been trained for 20 epochs and has progressed to a decent quality, the results I get when continuing training from the saved checkpoint look like it went back to epoch 1, looking very blurry and crude.

Quasimondo commented 7 years ago

Oh, never mind - I found the error and it was my fault. I had renamed my model path but missed an identifier, so it was loading the wrong checkpoint.