Arturus / kaggle-web-traffic

1st place solution
MIT License
1.82k stars 667 forks source link

Can't generate submission as no EMA checkpoints saved #10

Closed DBCerigo closed 6 years ago

DBCerigo commented 6 years ago

Hi,

Trying to run the scripts as specified in readme. Getting error on generating submission:

INFO:tensorflow:Restoring parameters from data/feeder.cpt
INFO:tensorflow:Restoring parameters from data/cpt/s32/cpt-1620
---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
~/miniconda3/envs/basev1/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1322     try:
-> 1323       return fn(*args)
   1324     except errors.OpError as e:

~/miniconda3/envs/basev1/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1301                                    feed_dict, fetch_list, target_list,
-> 1302                                    status, run_metadata)
   1303 

~/miniconda3/envs/basev1/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    472             compat.as_text(c_api.TF_Message(self.status.status)),
--> 473             c_api.TF_GetCode(self.status.status))
    474     # Delete the underlying status object from memory otherwise it stays alive

NotFoundError: Key m_0/m_0/decoder_output_proj/kernel/ExponentialMovingAverage not found in checkpoint
     [[Node: eval_saver/RestoreV2_6 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_eval_saver/Const_0_0, eval_saver/RestoreV2_6/tensor_names, eval_saver/RestoreV2_6/shape_and_slices)]]
     [[Node: eval_saver/RestoreV2_1/_9 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_26_eval_saver/RestoreV2_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

It seems that the --no-eval flag causes it not to save the ema checkpoints, could that be? (Specifically the ema_eval_stages list is always empty, unless do_eval is True.)

Thanks.