google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
38.23k stars 9.62k forks source link

Bert:tensorflow:Error recorded from training_loop: Read less bytes than requested #1387

Open MalaJeans opened 1 year ago

MalaJeans commented 1 year ago

When running the bert example, the following error occurs:

ERROR:tensorflow:Error recorded from training_loop: Read less bytes than requested [[node checkpoint_initializer_133 (defined at run_classifier.py:661) ]]

Original stack trace for 'checkpoint_initializer_133': File "run_classifier.py", line 981, in tf.app.run() File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 308, in run _run_main(main, args) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "run_classifier.py", line 880, in main estimator.train(input_fn=train_input_fn, max_steps=num_train_steps) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train saving_listeners=saving_listeners) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default features, labels, ModeKeys.TRAIN, self.config) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in _call_model_fn config) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn model_fn_results = self._model_fn(features=features, kwargs) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in _model_fn features, labels, is_export_mode=is_export_mode) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu return self._call_model_fn(features, labels, is_export_mode=is_export_mode) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in _call_model_fn estimator_spec = self._model_fn(features=features, kwargs) File "run_classifier.py", line 661, in model_fn tf.train.init_from_checkpoint(init_checkpoint, assignment_map) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint init_from_checkpoint_fn) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1684, in merge_call return self._merge_call(merge_fn, args, kwargs) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1691, in _merge_call return merge_fn(self._strategy, *args, *kwargs) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 286, in ckpt_dir_or_file, assignment_map) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 334, in _init_from_checkpoint _set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 458, in _set_variable_or_list_initializer _set_checkpoint_initializer(variable_or_list, ckpt_file, tensor_name, "") File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 412, in _set_checkpoint_initializer ckpt_file, [tensor_name], [slice_spec], [base_type], name=name)[0] File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2 name=name) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(args, **kwargs) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op op_def=op_def) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

E0626 16:26:39.769550 139724583494016 error_handling.py:70] Error recorded from training_loop: Read less bytes than requested [[node checkpoint_initializer_133 (defined at run_classifier.py:661) ]]

Original stack trace for 'checkpoint_initializer_133': File "run_classifier.py", line 981, in tf.app.run() File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 308, in run _run_main(main, args) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "run_classifier.py", line 880, in main estimator.train(input_fn=train_input_fn, max_steps=num_train_steps) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train saving_listeners=saving_listeners) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default features, labels, ModeKeys.TRAIN, self.config) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in _call_model_fn config) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn model_fn_results = self._model_fn(features=features, kwargs) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in _model_fn features, labels, is_export_mode=is_export_mode) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu return self._call_model_fn(features, labels, is_export_mode=is_export_mode) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in _call_model_fn estimator_spec = self._model_fn(features=features, kwargs) File "run_classifier.py", line 661, in model_fn tf.train.init_from_checkpoint(init_checkpoint, assignment_map) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint init_from_checkpoint_fn) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1684, in merge_call return self._merge_call(merge_fn, args, kwargs) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1691, in _merge_call return merge_fn(self._strategy, *args, *kwargs) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 286, in ckpt_dir_or_file, assignment_map) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 334, in _init_from_checkpoint _set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 458, in _set_variable_or_list_initializer _set_checkpoint_initializer(variable_or_list, ckpt_file, tensor_name, "") File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 412, in _set_checkpoint_initializer ckpt_file, [tensor_name], [slice_spec], [base_type], name=name)[0] File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2 name=name) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(args, **kwargs) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op op_def=op_def) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

INFO:tensorflow:training_loop marked as finished I0626 16:26:39.769962 139724583494016 error_handling.py:96] training_loop marked as finished WARNING:tensorflow:Reraising captured error W0626 16:26:39.770006 139724583494016 error_handling.py:130] Reraising captured error Traceback (most recent call last): File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.OutOfRangeError: Read less bytes than requested [[{{node checkpoint_initializer_133}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "run_classifier.py", line 981, in tf.app.run() File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 308, in run _run_main(main, args) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "run_classifier.py", line 880, in main estimator.train(input_fn=train_input_fn, max_steps=num_train_steps) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train rendezvous.raise_errors() File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors six.reraise(typ, value, traceback) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/six.py", line 719, in reraise raise value File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train saving_listeners=saving_listeners) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1192, in _train_model_default saving_listeners) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1480, in _train_with_estimator_spec log_step_count_steps=log_step_count_steps) as mon_sess: File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 584, in MonitoredTrainingSession stop_grace_period_secs=stop_grace_period_secs) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1007, in init stop_grace_period_secs=stop_grace_period_secs) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 725, in init self._sess = _RecoverableSession(self._coordinated_creator) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1200, in init _WrappedSession.init(self, self._create_session()) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 1205, in _create_session return self._sess_creator.create_session() File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 871, in create_session self.tf_sess = self._session_creator.create_session() File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/monitored_session.py", line 647, in create_session init_fn=self._scaffold.init_fn) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/session_manager.py", line 296, in prepare_session sess.run(init_op, feed_dict=init_feed_dict) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run run_metadata_ptr) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run feed_dict_tensor, options, run_metadata) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run run_metadata) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: Read less bytes than requested [[node checkpoint_initializer_133 (defined at run_classifier.py:661) ]]

Original stack trace for 'checkpoint_initializer_133': File "run_classifier.py", line 981, in tf.app.run() File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 308, in run _run_main(main, args) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "run_classifier.py", line 880, in main estimator.train(input_fn=train_input_fn, max_steps=num_train_steps) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train saving_listeners=saving_listeners) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default features, labels, ModeKeys.TRAIN, self.config) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2709, in _call_model_fn config) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn model_fn_results = self._model_fn(features=features, kwargs) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2967, in _model_fn features, labels, is_export_mode=is_export_mode) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1549, in call_without_tpu return self._call_model_fn(features, labels, is_export_mode=is_export_mode) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 1867, in _call_model_fn estimator_spec = self._model_fn(features=features, kwargs) File "run_classifier.py", line 661, in model_fn tf.train.init_from_checkpoint(init_checkpoint, assignment_map) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 291, in init_from_checkpoint init_from_checkpoint_fn) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1684, in merge_call return self._merge_call(merge_fn, args, kwargs) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1691, in _merge_call return merge_fn(self._strategy, *args, *kwargs) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 286, in ckpt_dir_or_file, assignment_map) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 334, in _init_from_checkpoint _set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 458, in _set_variable_or_list_initializer _set_checkpoint_initializer(variable_or_list, ckpt_file, tensor_name, "") File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 412, in _set_checkpoint_initializer ckpt_file, [tensor_name], [slice_spec], [base_type], name=name)[0] File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2 name=name) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(args, **kwargs) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op op_def=op_def) File "/home/gy/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

My running parameters are as follows: python run_classifier.py \ --task_name=MRPC \ --do_train=true \ --do_eval=true \ --data_dir=$GLUE_DIR/MRPC \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --bert_config_file=$BERT_BASE_DIR/bert_config.json \ --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \ --max_seq_length=128 \ --train_batch_size=32 \ --learning_rate=2e-5 \ --num_train_epochs=3.0 \ --output_dir=/tmp/mrpc_output/

Is there any solution?

drosenbluth commented 1 year ago

How are you ?

MalaJeans commented 1 year ago

How are you ?

I downloaded a data set again and modified running parameters. It is OK now.