Arturus / kaggle-web-traffic

1st place solution
MIT License
1.82k stars 667 forks source link

update to tensorflow 1.8 #25

Closed zheolong closed 5 years ago

zheolong commented 5 years ago

@Arturus Thx for sharing, do u have plan to update the tf version? I have some problem in updating the version from 1.4 to 1.8, the problem is CudnnGRU api changed alot.

HJFS commented 5 years ago

Hi @zheolong , I meet the same problem with you. Have you solved the problem ?

zheolong commented 5 years ago

@HJFS I tried, but failed. Spent almost two weeks for refactoring, but I give up now. The difference make me feel depressed.

zheolong commented 5 years ago

@HJFS If u've made any progress, please let me know, thx

eduardbermejo commented 5 years ago

Hello, I'm also trying to reuse the code with tensorflow 1.11 but for a different task. Have any of you made any progress refactoring the code? If I succeed I will let you know and share the code.

zheolong commented 5 years ago

@ERed Still no, u can have a try. @ERed If u have any problems, please let me know, maybe I've already encountered.

Arturus commented 5 years ago

Model is fixed to work with TF 1.10

zheolong commented 5 years ago

@Arturus Great, thx!

zheolong commented 5 years ago

@Arturus What' ur cuda and cudnn version?

Arturus commented 5 years ago

Any version suitable for Tensorflow 1.10. Model has no bindings to specific CUDA version.

zheolong commented 5 years ago

@Arturus It get errors when the versions are

$conda list|egrep -h "tensor|cuda"
cudatoolkit               9.2                           0
cudnn                     7.2.1                 cuda9.2_0
tensorboard               1.10.0                   py36_0    conda-forge
tensorflow                1.10.0          gpu_py36hcebf108_0
tensorflow-base           1.10.0          gpu_py36had579c0_0
tensorflow-gpu            1.10.0               hf154084_0

the errors are:

WARNING:tensorflow:From /user/zheolong/kaggle-web-traffic/model.py:144: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Traceback (most recent call last):
  File "trainer.py", line 776, in <module>
    train(**param_dict)
  File "trainer.py", line 507, in train
    all_models = [create_model(scope, 0, None, seed=seed)]
  File "trainer.py", line 483, in create_model
    forward_eval_model = Model(forward_eval_pipe, hparams, is_train=False, seed=seed)
  File "/user/zheolong/kaggle-web-traffic/model.py", line 342, in __init__
    transpose_output=False)
  File "/user/zheolong/kaggle-web-traffic/model.py", line 65, in make_encoder
    rnn_out, (rnn_state,) = cuda_model(inputs=rnn_time_input)
  File "/user/zheolong/anaconda3/envs/py3.6_tf1.10/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 362, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "/user/zheolong/anaconda3/envs/py3.6_tf1.10/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 728, in __call__
    self.build(input_shapes)
  File "/user/zheolong/anaconda3/envs/py3.6_tf1.10/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 362, in build
    initializer=opaque_params_t, validate_shape=False)
  File "/user/zheolong/anaconda3/envs/py3.6_tf1.10/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1467, in get_variable
    aggregation=aggregation)
  File "/user/zheolong/anaconda3/envs/py3.6_tf1.10/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1217, in get_variable
    aggregation=aggregation)
  File "/user/zheolong/anaconda3/envs/py3.6_tf1.10/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 510, in get_variable
    return custom_getter(**custom_getter_kwargs)
  File "/user/zheolong/anaconda3/envs/py3.6_tf1.10/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 294, in _update_trainable_weights
    variable = getter(*args, **kwargs)
  File "/user/zheolong/anaconda3/envs/py3.6_tf1.10/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 481, in _true_getter
    aggregation=aggregation)
  File "/user/zheolong/anaconda3/envs/py3.6_tf1.10/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 866, in _get_single_variable
    "reuse=tf.AUTO_REUSE in VarScope?" % name)
ValueError: Variable cudnn_gru_1/opaque_kernel does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=tf.AUTO_REUSE in VarScope?
(py3.6_tf1.10)
zheolong commented 5 years ago

Any version suitable for Tensorflow 1.10. Model has no bindings to specific CUDA version.

@Arturus Do u have any ideas about this error?