google-research / bert

TensorFlow code and pre-trained models for BERT
https://arxiv.org/abs/1810.04805
Apache License 2.0
38.21k stars 9.61k forks source link

An error occur when I run run_classifier.py #382

Open TahoeWang opened 5 years ago

TahoeWang commented 5 years ago

INFO:tensorflow: Running training INFO:tensorflow: Num examples = 84963 INFO:tensorflow: Batch size = 16 INFO:tensorflow: Num steps = 15930 INFO:tensorflow:Calling model_fn. INFO:tensorflow:Running train on CPU INFO:tensorflow: Features INFO:tensorflow: name = input_ids, shape = (16, 64) INFO:tensorflow: name = input_mask, shape = (16, 64) INFO:tensorflow: name = label_ids, shape = (16,) INFO:tensorflow: name = segment_ids, shape = (16, 64) INFO:tensorflow: Trainable Variables INFO:tensorflow: name = bert/embeddings/word_embeddings:0, shape = (21128, 768), INIT_FROM_CKPT INFO:tensorflow: name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), INIT_FROM_CKPT INFO:tensorflow: name = bert/embeddings/position_embeddings:0, shape = (512, 768), INIT_FROM_CKPT INFO:tensorflow: name = bert/embeddings/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT INFO:tensorflow: name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT INFO:tensorflow: name = bert/encoder/layer_0/attention/self/query/kernel:0, shape = (768, 768), INIT_FROM_CKPT INFO:tensorflow: name = bert/encoder/layer_0/attention/self/query/bias:0, shape = (768,), ........... INIT_FROM_CKPT INFO:tensorflow: name = bert/encoder/layer_11/attention/self/value/bias:0, shape = (768,), INIT_FROM_CKPT INFO:tensorflow: name = bert/encoder/layer_11/attention/output/dense/kernel:0, shape = (768, 768), INIT_FROM_CKPT INFO:tensorflow: name = bert/encoder/layer_11/attention/output/dense/bias:0, shape = (768,), INIT_FROM_CKPT INFO:tensorflow: name = bert/encoder/layer_11/attention/output/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT INFO:tensorflow: name = bert/encoder/layer_11/attention/output/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT INFO:tensorflow: name = bert/encoder/layer_11/intermediate/dense/kernel:0, shape = (768, 3072), INIT_FROM_CKPT INFO:tensorflow: name = bert/encoder/layer_11/intermediate/dense/bias:0, shape = (3072,), INIT_FROM_CKPT INFO:tensorflow: name = bert/encoder/layer_11/output/dense/kernel:0, shape = (3072, 768), INIT_FROM_CKPT INFO:tensorflow: name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,), INIT_FROM_CKPT INFO:tensorflow: name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,), INIT_FROM_CKPT INFO:tensorflow: name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,), INIT_FROM_CKPT INFO:tensorflow: name = bert/pooler/dense/kernel:0, shape = (768, 768), INIT_FROM_CKPT INFO:tensorflow: name = bert/pooler/dense/bias:0, shape = (768,), INIT_FROM_CKPT INFO:tensorflow: name = output_weights:0, shape = (2, 768) INFO:tensorflow: name = output_bias:0, shape = (2,) INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Graph was finalized. INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Saving checkpoints for 0 into output/model.ckpt. Traceback (most recent call last):

File "E:\GitHub\google\bert-master\run_classifier.py", line 961, in tf.app.run()

File "D:\ProgramFiles\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv))

File "E:\GitHub\google\bert-master\run_classifier.py", line 884, in main estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)

File "D:\ProgramFiles\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py", line 370, in train loss = self._train_model(input_fn, hooks, saving_listeners)

File "D:\ProgramFiles\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1139, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners)

File "D:\ProgramFiles\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1167, in _train_model_default saving_listeners)

File "D:\ProgramFiles\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1445, in _train_with_estimatorspec , loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])

File "D:\ProgramFiles\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 583, in run run_metadata=run_metadata)

File "D:\ProgramFiles\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1059, in run run_metadata=run_metadata)

File "D:\ProgramFiles\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1150, in run raise six.reraise(*original_exc_info)

File "D:\ProgramFiles\Anaconda3\lib\site-packages\six.py", line 693, in reraise raise value

File "D:\ProgramFiles\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1135, in run return self._sess.run(*args, **kwargs)

File "D:\ProgramFiles\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1207, in run run_metadata=run_metadata)

File "D:\ProgramFiles\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 987, in run return self._sess.run(*args, **kwargs)

File "D:\ProgramFiles\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 885, in run run_metadata_ptr)

File "D:\ProgramFiles\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1108, in _run feed_dict_tensor, options, run_metadata)

File "D:\ProgramFiles\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1280, in _do_run run_metadata)

File "D:\ProgramFiles\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1299, in _do_call raise type(e)(node_def, op, message)

NotFoundError: No registered '_CopyFromGpuToHost' OpKernel for CPU devices compatible with node swap_out_gradients/bert/encoder/layer_0/output/dense/MatMul_grad/MatMul_1_0 = _CopyFromGpuToHostT=DT_FLOAT, _class=["loc@gradients/bert/encoder/layer_0/output/dense/MatMul_grad/MatMul_1_0"], _device="/job:localhost/replica:0/task:0/device:CPU:0" . Registered: device='GPU'

 [[Node: swap_out_gradients/bert/encoder/layer_0/output/dense/MatMul_grad/MatMul_1_0 = _CopyFromGpuToHost[T=DT_FLOAT, _class=["loc@gradients/bert/encoder/layer_0/output/dense/MatMul_grad/MatMul_1_0"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](bert/encoder/layer_0/intermediate/dense/mul_1/_4059)]]
SuhasSheshadri commented 5 years ago

Try reducing the batch size. This worked for me.