fuxiAIlab / RL4RS

A Real-World Benchmark for Reinforcement Learning based Recommender System
Creative Commons Attribution Share Alike 4.0 International
220 stars 26 forks source link

Problems about TensorFlow version and killed error #2

Open Heth0531 opened 2 years ago

Heth0531 commented 2 years ago

I reproduced run_batch_rl according to the guidelines but the errors are as follows.

`WARNING:tensorflow:From /root/miniconda3/envs/rl4rs/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor WARNING:tensorflow:From /root/miniconda3/envs/rl4rs/lib/python3.6/site-packages/deepctr/contrib/rnn.py:257: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where WARNING:tensorflow:From /mnt/rl4rs_pro/RL4RS/RL4RS/script/rl4rs/nets/dien.py:43: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.

WARNING:tensorflow:From /mnt/rl4rs_pro/RL4RS/RL4RS/script/rl4rs/nets/dien.py:43: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

WARNING:tensorflow:From /mnt/rl4rs_pro/RL4RS/RL4RS/script/rl4rs/env/base.py:124: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /mnt/rl4rs_pro/RL4RS/RL4RS/script/rl4rs/env/base.py:125: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /mnt/rl4rs_pro/RL4RS/RL4RS/script/rl4rs/env/base.py:129: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

/mnt/rl4rs_pro/RL4RS/RL4RS/script/rl4rs/env/slate.py:279: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray complete_states = np.array(samples.get_complete_states()) run_batch_rl.sh: line 82: 180 Killed python -u batchrl_train.py $algo 'dataset_generate' "{'env':'SlateRecEnv-v0','iteminfo_file':'${rl4rs_dataset_dir}/item_info.csv','sample_file':'${rl4rs_dataset_dir}/rl4rs_dataset_a_shuf.csv','model_file':'${rl4rs_output_dir}/simulator_a_dien/model','trial_name':'a_all'}"`

First it seems to be some warnings with the TensorFlow version, my own version is 1.15.0, I checked the environment file that what it need is also 1.15.0. I tried other versions such as 1.14.0 and 2.0.0 but still failed. However actually they are just warnings but not errors, so I don't know if I do have to use another version. And another problem is that finally it reported killed and aborted.

asdqsczser commented 2 years ago
font{
    line-height: 1.6;
}
ul,ol{
    padding-left: 20px;
    list-style-position: inside;
}

    This problem seems to be the problem of OOM.  You can check the OOM through ’top’ command. Or you can change 'epoch = 1000000 // batch_size’ to 'epoch = 10000 // batch_size’ in script/batchrl_trainer.py.And would you like to make a copy of this question on GitHub issue? Thanks.

On 03/29/2022 ***@***.***> wrote: 

I reproduced run_batch_rl according to the guidelines but the errors are as follows. WARNING:tensorflow:From /root/miniconda3/envs/rl4rs/lib/python3.6/site-packages/tensorflow_core/python/ops/rnn_cell_impl.py:575: calling Zeros.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor WARNING:tensorflow:From /root/miniconda3/envs/rl4rs/lib/python3.6/site-packages/deepctr/contrib/rnn.py:257: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.where in 2.0, which has the same broadcast rule as np.where WARNING:tensorflow:From /mnt/rl4rs_pro/RL4RS/RL4RS/script/rl4rs/nets/dien.py:43: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead. WARNING:tensorflow:From /mnt/rl4rs_pro/RL4RS/RL4RS/script/rl4rs/nets/dien.py:43: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead. WARNING:tensorflow:From /mnt/rl4rs_pro/RL4RS/RL4RS/script/rl4rs/env/base.py:124: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead. WARNING:tensorflow:From /mnt/rl4rs_pro/RL4RS/RL4RS/script/rl4rs/env/base.py:125: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead. WARNING:tensorflow:From /mnt/rl4rs_pro/RL4RS/RL4RS/script/rl4rs/env/base.py:129: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead. /mnt/rl4rs_pro/RL4RS/RL4RS/script/rl4rs/env/slate.py:279: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray complete_states = np.array(samples.get_complete_states()) run_batch_rl.sh: line 82: 180 Killed python -u batchrl_train.py $algo 'dataset_generate' "{'env':'SlateRecEnv-v0','iteminfo_file':'${rl4rs_dataset_dir}/item_info.csv','sample_file':'${rl4rs_dataset_dir}/rl4rs_dataset_a_shuf.csv','model_file':'${rl4rs_output_dir}/simulator_a_dien/model','trial_name':'a_all'}" First it seems to be some warnings with the TensorFlow version, my own version is 1.15.0, I checked the environment file that what it need is also 1.15.0. I tried other versions such as 1.14.0 and 2.0.0 but still failed. However actually they are just warnings but not errors, so I don't know if I do have to use another version. And another problem is that finally it reported killed and aborted.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

asdqsczser commented 2 years ago

This problem seems to be the problem of OOM. You can check the OOM through 'top' command. Or you can change 'epoch = 1000000 // batch_size' to 'epoch = 10000 // batch_size' in script/batchrl_trainer.py.