When I run run_pretraining.py on the GPU, I get the following error, but run_pretraining_test.py is normal.

wutong4012 commented 4 years ago

Traceback (most recent call last): File "run_pretraining.py", line 577, in tf.app.run() File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "run_pretraining.py", line 534, in main estimator.train(input_fn=train_input_fn, max_steps=FLAGS.num_train_steps) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3035, in train rendezvous.raise_errors() File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 136, in raise_errors six.reraise(typ, value, traceback) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/six.py", line 703, in reraise raise value File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train saving_listeners=saving_listeners) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default saving_listeners) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1494, in _train_with_estimatorspec , loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run run_metadata=run_metadata) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1259, in run run_metadata=run_metadata) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run raise six.reraise(original_exc_info) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/six.py", line 703, in reraise raise value File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run return self._sess.run(args, *kwargs) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1418, in run run_metadata=run_metadata) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 1176, in run return self._sess.run(args, **kwargs) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/home/wutong/anaconda3/envs/wt/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: {{function_node tf_data_experimental_map_andbatch_54}} Key: masked_lm_weights. Can't parse serialized Example. [[{{node ParseSingleExample/ParseSingleExample}}]] [[IteratorGetNext]] [[IteratorGetNext/_787]] (1) Invalid argument: {{function_node tf_data_experimental_map_andbatch_54}} Key: masked_lm_weights. Can't parse serialized Example. [[{{node ParseSingleExample/ParseSingleExample}}]] [[IteratorGetNext]]

matthewygf commented 4 years ago

check out this link https://github.com/google-research/bert/issues/283#issuecomment-449918585

illuminascent commented 4 years ago

This could also happen if you training sample does not consist of 20 masked tokens, make sure you specify max_predictions_per_seq when training.

google-research / albert

When I run run_pretraining.py on the GPU, I get the following error, but run_pretraining_test.py is normal. #172