Error during training (following default usage described in the readme.md)

lk251 commented 6 years ago

Python 3.6.4 Tensorflow 1.4.0

After following the commands in the "Usage" section of the readme: sh scripts/prepare_Cornell_Movie-Dialogs_Corpus.sh python main.py --config cornell-movie-dialogs --mode train_and_evaluate

INFO:tensorflow:loss = 10.636566, step = 1 INFO:tensorflow:global_step/sec: 0.628262 INFO:tensorflow:loss = 5.791372, step = 101 (159.167 sec) INFO:tensorflow:global_step/sec: 0.647979 INFO:tensorflow:loss = 5.864272, step = 201 (154.326 sec) INFO:tensorflow:global_step/sec: 0.655491 INFO:tensorflow:loss = 6.230704, step = 301 (152.557 sec) INFO:tensorflow:global_step/sec: 0.665124 INFO:tensorflow:loss = 5.784849, step = 401 (150.348 sec) INFO:tensorflow:global_step/sec: 0.670939 INFO:tensorflow:loss = 6.017593, step = 501 (149.045 sec) INFO:tensorflow:global_step/sec: 0.668164 INFO:tensorflow:loss = 5.5392556, step = 601 (149.664 sec) INFO:tensorflow:global_step/sec: 0.645968 INFO:tensorflow:loss = 5.649448, step = 701 (154.806 sec) INFO:tensorflow:global_step/sec: 0.646482 INFO:tensorflow:loss = 5.597257, step = 801 (154.683 sec) INFO:tensorflow:global_step/sec: 0.674069 INFO:tensorflow:loss = 5.0681148, step = 901 (148.353 sec) Traceback (most recent call last): File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py", line 686, in _call_cpp_shape_fn_impl input_tensors_as_shapes, status) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Shapes must be equal rank, but are 3 and 2 for 'decoder/decoder/while/Select_4' (op: 'Select') with input shapes: [?,5], [?,5,1024], [?,5,1024].

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main.py", line 98, in main(args.mode) File "main.py", line 71, in main hparams=params File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 218, in run return _execute_schedule(experiment, schedule) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 46, in _execute_schedule return task() File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 625, in train_and_evaluate self.train(delay_secs=0) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 367, in train hooks=self._train_monitors + extra_hooks) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 807, in _call_train hooks=hooks) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 302, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 783, in _trainmodel , loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 521, in run run_metadata=run_metadata) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 892, in run run_metadata=run_metadata) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 967, in run raise six.reraise(original_exc_info) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 952, in run return self._sess.run(args, kwargs) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1032, in run run_metadata=run_metadata)) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/monitors.py", line 1196, in after_run induce_stop = m.step_end(self._last_step, result) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/monitors.py", line 356, in step_end return self.every_n_step_end(step, output) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/monitors.py", line 694, in every_n_step_end validation_outputs = self._evaluate_estimator() File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/monitors.py", line 665, in _evaluate_estimator name=self.name) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 355, in evaluate name=name) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 810, in _evaluate_model features, labels, model_fn_lib.ModeKeys.EVAL, self.config) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 694, in _call_model_fn model_fn_results = self._model_fn(features=features, kwargs) File "/home/javier/repos/dialogue/conversation-tensorflow/model.py", line 23, in model_fn self.build_graph() File "/home/javier/repos/dialogue/conversation-tensorflow/model.py", line 57, in build_graph decoder_inputs=self.decoder_inputs) File "/home/javier/repos/dialogue/conversation-tensorflow/seq2seq_attention/init.py", line 37, in build self._build_decoder() File "/home/javier/repos/dialogue/conversation-tensorflow/seq2seq_attention/init.py", line 112, in _build_decoder length_penalty_weight=Config.predict.length_penalty_weight) File "/home/javier/repos/dialogue/conversation-tensorflow/seq2seq_attention/decoder.py", line 172, in build embedding, start_tokens, end_token, length_penalty_weight) File "/home/javier/repos/dialogue/conversation-tensorflow/seq2seq_attention/decoder.py", line 215, in _beam_search_decoder maximum_iterations=self.maximum_iterations) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 286, in dynamic_decode swap_memory=swap_memory) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2816, in while_loop result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2640, in BuildLoop pred, body, original_loop_vars, loop_vars, shape_invariants) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2590, in _BuildLoop body_result = body(packed_vars_for_body) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 269, in body _maybe_copy_state, decoder_state, state) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/util/nest.py", line 413, in map_structure structure[0], [func(x) for x in entries]) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/util/nest.py", line 413, in structure[0], [func(x) for x in entries]) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 265, in _maybe_copy_state return new if pass_through else array_ops.where(finished, cur, new) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 2441, in where return gen_math_ops._select(condition=condition, t=x, e=y, name=name) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 3988, in _select "Select", condition=condition, t=t, e=e, name=name) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2958, in create_op set_shapes_for_outputs(ret) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2209, in set_shapes_for_outputs shapes = shape_func(op) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2159, in call_with_requiring return call_cpp_shape_fn(op, require_shape_fn=True) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py", line 627, in call_cpp_shape_fn require_shape_fn) File "/home/javier/repos/dialogue/conversation-tensorflow/venv/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py", line 691, in _call_cpp_shape_fn_impl raise ValueError(err.message) ValueError: Shapes must be equal rank, but are 3 and 2 for 'decoder/decoder/while/Select_4' (op: 'Select') with input shapes: [?,5], [?,5,1024], [?,5,1024]. The learning is finished with conversation-tensorflow* Project using cornell-movie-dialogs config.

What might be wrong?

DongjunLee commented 6 years ago

Hi @lev-kusanagi, I think that error occur during evaluate. I'll comment here when I fix the error. Thank you for the issue.

lk251 commented 6 years ago

Thank you @DongjunLee !!

lk251 commented 6 years ago

@DongjunLee Could this be similar to the issue in #1 ?

If so, I'm not sure where to put this:

decoder_initial_state = out_cell.zero_state(Config.train.batch_size, self.dtype) 
decoder_initial_state.clone(cell_state=self.encoder_final_state)

If it's the solution...

DongjunLee commented 6 years ago

@lev-kusanagi your guess was help me a lot :)

The issue is BeamSearchDecoder in evaluate mode.

ValueError: Shapes must be equal rank, but are 3 and 2 for 'decoder/decoder/while/Select_4' (op: 'Select') with input shapes: [?,5], [?,5,1024], [?,5,1024]. # 5 is beam_width.

So, i fixed that beam search decoder only work in predict mode. pull latest commit then try again.

lk251 commented 6 years ago

Thanks @DongjunLee !

lk251 commented 6 years ago

@DongjunLee Evaluation seems to work (with main.py), but python chat.py --config cornell-movie-dialogs still produces the same error.

DongjunLee commented 6 years ago

Oh, i also fixed it now. The problem is impute_finished=True

impute_finished: Python boolean. If True, then states for batch entries which are marked as finished get copied through and the corresponding outputs get zeroed out. This causes some slowdown at each time step, but ensures that the final state and outputs have the correct values and that backprop ignores time steps that were marked as finished.

I changed impute_finished=False.

lk251 commented 6 years ago

Awesome!! Once again, thanks.

DongjunLee / conversation-tensorflow

Error during training (following default usage described in the readme.md) #9