WeChat-Big-Data-Challenge-2021 / WeChat_Big_Data_Challenge

308 stars 96 forks source link

Program gets stuck after "Done running local_init_op." #4

Open victory1128 opened 3 years ago

victory1128 commented 3 years ago

I am running

python baseline.py offline_train

It will suck and the print information is

I0529 02:03:15.461638 4552134080 estimator.py:1147] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I0529 02:03:15.462327 4552134080 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I0529 02:03:17.626281 4552134080 monitored_session.py:240] Graph was finalized.
INFO:tensorflow:Running local_init_op.
I0529 02:03:18.371701 4552134080 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0529 02:03:18.524997 4552134080 session_manager.py:502] Done running local_init_op.

After a long time, it will train and the time cost is 10664.87 s

my system information is

package infromation

What is the problem?

yaosheng42 commented 3 years ago

可以发下具体运行时的参数吗?例如采样比例。

victory1128 commented 3 years ago

可以发下具体运行时的参数吗?例如采样比例。

参数均使用的默认参数,对code没有修改过。数据也是用的默认数据。

但是如果把训练数据降到4w条就不会卡住了。