google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
38.3k stars 9.62k forks source link

[ERROR] TPUEstimator with BestExporter: ValueError: slice index 1 of dimension 0 out of bounds. #956

Open loretoparisi opened 5 years ago

loretoparisi commented 5 years ago

I'm training BERT for NER fine-tuning (starting from this example) using the estimator TPUEstimator. This normally works ok. When adding the BestExporter with features descriptors like:

def serving_input_receiver_fn():
    with tf.variable_scope("foo"):
        feature_spec = {
            "input_ids": tf.FixedLenFeature([FLAGS.max_seq_length], tf.int64),
            "mask": tf.FixedLenFeature([FLAGS.max_seq_length], tf.int64),
            "segment_ids": tf.FixedLenFeature([FLAGS.max_seq_length], tf.int64),
            "label_ids": tf.FixedLenFeature([], tf.int64),
            "is_real_example": tf.FixedLenFeature([], tf.int64)
        }
        serialized_tf_example = tf.placeholder(dtype=tf.string,
                                             shape=[None],
                                             name='input_example_tensor')
        receiver_tensors = {'examples': serialized_tf_example}
        features = tf.parse_example(serialized_tf_example, feature_spec)
        return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)

where the features definitions have been described in https://github.com/google-research/bert/issues/146

and then setup the TPUEstimator as usually:

estimator = tf.contrib.tpu.TPUEstimator(
               use_tpu=FLAGS.use_tpu,
                model_fn=model_fn,
                config=run_config,
                train_batch_size=FLAGS.train_batch_size,
                eval_batch_size=FLAGS.eval_batch_size,
                predict_batch_size=FLAGS.predict_batch_size)

and the BestExporter passing the features dictionary defined by serving_input_receiver_fn

exporter = tf.estimator.BestExporter(
                name="best_exporter",
                serving_input_receiver_fn=serving_input_receiver_fn,
                exports_to_keep=5)
train_spec = tf.estimator.TrainSpec( 
                input_fn=train_input_fn,
                max_steps=num_train_steps)
            eval_spec = tf.estimator.EvalSpec(
                input_fn=eval_input_fn,
                steps=100,
                exporters=exporter,
                start_delay_secs=0,
                throttle_secs=5)

I then get the error

ValueError: slice index 1 of dimension 0 out of bounds. for 'strided_slice_4' (op: 'StridedSlice') with input shapes: [1], [1], [1], [1] and with computed input tensors: input[1] = <1>, input[2] = <2>, input[3] = <1>.

The input features definition should be ok since it's like

class InputFeatures(object):
  """A single set of features of data."""
​
  def __init__(self,
               input_ids,
               mask,
               segment_ids,
               label_ids,
               is_real_example=True):
    self.input_ids = input_ids
    self.mask = mask
    self.segment_ids = segment_ids
    self.label_ids = label_ids
paulthemagno commented 5 years ago

This is the full error log:

I1206 11:50:20.705792 140609273997120 ag_logging.py:139] Whitelisted: <function tanh at 0x7fe1f5695400>: name starts with "tensorflow"
E1206 11:50:20.905941 140609273997120 error_handling.py:70] Error recorded from training_loop: slice index 1 of dimension 0 out of bounds. for 'strided_slice_4' (op: 'StridedSlice') with input shapes: [1], [1], [1], [1] and with computed input tensors: input[1] = <1>, input[2] = <2>, input[3] = <1>.
I1206 11:50:20.906070 140609273997120 error_handling.py:96] training_loop marked as finished
W1206 11:50:20.906140 140609273997120 error_handling.py:130] Reraising captured error
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1864, in _create_c_op
    c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: slice index 1 of dimension 0 out of bounds. for 'strided_slice_4' (op: 'StridedSlice') with input shapes: [1], [1], [1], [1] and with computed input tensors: input[1] = <1>, input[2] = <2>, input[3] = <1>.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "BERT_NER_conll-exporter.py", line 923, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "BERT_NER_conll-exporter.py", line 779, in main
    tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
    return executor.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run
    return self.run_local()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
    rendezvous.raise_errors()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
    six.reraise(typ, value, traceback)
  File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default
    saving_listeners)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1484, in _train_with_estimator_spec
    _, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1252, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1353, in run
    raise six.reraise(*original_exc_info)
  File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1338, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1419, in run
    run_metadata=run_metadata))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/basic_session_run_hooks.py", line 594, in after_run
    if self._save(run_context.session, global_step):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/basic_session_run_hooks.py", line 619, in _save
    if l.after_save(session, step):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 519, in after_save
    self._evaluate(global_step_value)  # updates self.eval_result
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 539, in _evaluate
    self._evaluator.evaluate_and_export())
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 932, in evaluate_and_export
    is_the_final_export)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 965, in _export_eval_result
    is_the_final_export=is_the_final_export))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/exporter.py", line 303, in export
    is_the_final_export)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/exporter.py", line 120, in export
    checkpoint_path=checkpoint_path)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 732, in export_saved_model
    strip_default_attrs=True)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 856, in _export_all_saved_models
    strip_default_attrs=strip_default_attrs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2679, in _add_meta_graph_for_mode
    strip_default_attrs=strip_default_attrs))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 929, in _add_meta_graph_for_mode
    config=self.config)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in _call_model_fn
    config)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in _model_fn
    features, labels, is_export_mode=is_export_mode)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu
    return self._call_model_fn(features, labels, is_export_mode=is_export_mode)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in _call_model_fn
    estimator_spec = self._model_fn(features=features, **kwargs)
  File "BERT_NER_conll-exporter.py", line 552, in model_fn
    use_one_hot_embeddings)
  File "BERT_NER_conll-exporter.py", line 528, in create_model
    loss, trans = crf_loss(logits,labels,mask,num_labels,mask2len)
  File "BERT_NER_conll-exporter.py", line 486, in crf_loss
    log_likelihood,transition = tf.contrib.crf.crf_log_likelihood(logits,labels,transition_params =trans ,sequence_lengths=mask2len)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/crf/python/ops/crf.py", line 257, in crf_log_likelihood
    transition_params)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/crf/python/ops/crf.py", line 116, in crf_sequence_score
    false_fn=_multi_seq_fn)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/layers/utils.py", line 202, in smart_cond
    pred, true_fn=true_fn, false_fn=false_fn, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/smart_cond.py", line 56, in smart_cond
    return false_fn()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/crf/python/ops/crf.py", line 104, in _multi_seq_fn
    unary_scores = crf_unary_score(tag_indices, sequence_lengths, inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/crf/python/ops/crf.py", line 294, in crf_unary_score
    maxlen=array_ops.shape(tag_indices)[1],
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 680, in _slice_helper
    name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/array_ops.py", line 846, in strided_slice
    shrink_axis_mask=shrink_axis_mask)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 9989, in strided_slice
    shrink_axis_mask=shrink_axis_mask, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 2027, in __init__
    control_input_ops)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1867, in _create_c_op
    raise ValueError(str(e))
ValueError: slice index 1 of dimension 0 out of bounds. for 'strided_slice_4' (op: 'StridedSlice') with input shapes: [1], [1], [1], [1] and with computed input tensors: input[1] = <1>, input[2] = <2>, input[3] = <1>.