Arturus / kaggle-web-traffic

1st place solution
MIT License
1.82k stars 667 forks source link

cudnn_gru ValueError when forward_split= True #27

Open EduardBermejoScrm opened 5 years ago

EduardBermejoScrm commented 5 years ago

First of all thank you for upgrading your code and having fixed all issues recently!

When I run train with --no_forward_split everything works ok, however when running train() to eval with forward_split=True I get a ValueError: Variable cudnn_gru_1/opaque_kernel does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope? Any idea on how to fix this or what is causing this issue?

Could it be related to the fact that now we are instantiating two models? train_model and forward_eval_model ?

Thank you

svjack commented 5 years ago

this issue occur for the reason that in the class Model init multi-times without variable scope control. you may tackle it by with tf.variable reuse scope (for example, in the init func of class Model)

liumanfei commented 5 years ago

@EduardBermejoScrm Hello, got the same error here, have you fixed it? I use tf 1.11.0, Is there a connection?

liumanfei commented 5 years ago

this issue occur for the reason that in the class Model init multi-times without variable scope control. you may tackle it by with tf.variable reuse scope (for example, in the init func of class Model)

It sets scope.reuse_variables() out of class Model ,which is between the 1st and 2nd init in function train, could you please explain why it doesn't work?

svjack commented 5 years ago

it is my mistake i answer the question.

EduardBermejoScrm commented 5 years ago

Hey, yes, I fixed it. I wrote a stackoverflow general question about this issue: link The thing is that scope.reuse_variables() is not working properly so I ended up wrapping both the train and the eval model in a 'model' variable_scope so they could share it. Before train_model in line 471 of trainer.py add this line to wrap the train model creation: with tf.variable_scope('model') as scope: and then before eval_stages = [] in line 474 add the same line to wrap the eval model in the same variable scope. Finally delete scope.reuse_variables() in line 472. Hope this helps!