Bert_model.ckpt not found with run_squad.py on TPU

dhruvluci commented 6 years ago

After running the following for about 5 minutes on a cloud based TPU, I get an error Unsuccessful TensorSliceReader constructor: Failed to get matching files

The command is as follows: python run_squad.py --vocab_file=$BERT_BASE_DIR/vocab.txt --bert_config_file=$BERT_BASE_DIR/bert_config.json --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt --do_train=True --train_file=$SQUAD_DIR/train-v1.1.json --do_predict=True --predict_file=$SQUAD_DIR/dev-v1.1.json --train_batch_size=24 --learning_rate=3e-5 --num_train_epochs=2.0 --max_seq_length=384 --doc_stride=128 --output_dir=gs://data_for_squad1/Squad1/ --use_tpu=True --tpu_name=$TPU_NAME

The BERT_BASE_DIR (./largebert) has the following files: bert_config.json bert_model.ckpt.data-00000-of-00001 bert_model.ckpt.index bert_model.ckpt.meta vocab.txt

Here is the detailed Traceback:

self._traceback = tf_stack.extract_stack() _train_on_tpu_system(ctx, model_fn_wrapper, dequeue_fn)) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2830, in _train_on_tpu_system scaffold = _get_scaffold(captured_scaffold_fn) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2953, in _get_scaffold scaffold = scaffold_fn() File "run_squad.py", line 584, in tpu_scaffold tf.train.init_from_checkpoint(init_checkpoint, assignment_map) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 187, in init_from_checkpoint _init_from_checkpoint, ckpt_dir_or_file, assignment_map) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/distribute.py", line 1053, in merge_call return self._merge_call(merge_fn, *args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/distribute.py", line 1061, in _merge_call return merge_fn(self._distribution_strategy, *args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 231, in _init_from_checkpoint _set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 355, in _set_variable_or_list_initializer _set_checkpoint_initializer(variable_or_list, ckpt_file, tensor_name, "") File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 309, in _set_checkpoint_initializer ckpt_file, [tensor_name], [slice_spec], [base_type], name=name)[0] File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1770, in __init__ self._traceback = tf_stack.extract_stack() InvalidArgumentError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to get matching files on ./largebert/bert_model.ckpt: Unimplemented: File system scheme '[local]' not implemented (file: './largebert/bert_model.ckpt') [[node checkpoint_initializer_370 (defined at run_squad.py:584) = RestoreV2[dtypes=[DT_FLOAT], _device="/job:worker/replica:0/task:0/device:CPU:0"](checkpoin t_initializer/prefix, checkpoint_initializer_370/tensor_names, checkpoint_initializer/shape_and_slices)]]

Been trying to troubleshoot for a while, not sure where the problem lies. Any help would be appreciated.

webstruck commented 6 years ago

It basically says that local file system scheme is not supported. Your config, vocab and init_checkpoint should also point to your google cloud bucket.

For e.g.

python bert/run_squad.py \ --vocab_file=gs://{your bucket name}/vocab.txt \ --bert_config_file=gs://{your bucket name}/bert_config.json \ --init_checkpoint=gs://{your bucket name}/bert_model.ckpt \ --do_train=False \ --train_file=train-v1.1.json \ --do_predict=True \ --predict_file=dev-v1.1.json \ --train_batch_size=24 \ --learning_rate=3e-5 \ --num_train_epochs=2.0 \ --max_seq_length=384 \ --doc_stride=128 \ --output_dir=gs://{your bucket name}/squad_large/ \ --use_tpu=True \ --tpu_name=grpc://{tpu_name}

chenshaolong commented 5 years ago

When I ran below code in VM instance on TPU

python /home/schen/bert/run_squad.py \ --vocab_file=gs://{bucket_name}/uncased_L-12_H-768_A-12/vocab.txt \ --bert_config_file=gs:/{bucket_name}/uncased_L-12_H-768_A-12/bert_config.json \ --init_checkpoint=gs://{bucket_name}/uncased_L-12_H-768_A-12/bert_model.ckpt \ --do_train=True \ --do_predict=True \ --train_file=/home/schen/squad/train-v1.1.json \ --predict_file=/home/schen/squad/dev-v1.1.json \ --train_batch_size=32 \ --learning_rate=3e-5 \ --num_train_epochs=2.0 \ --max_seq_length=384 \ --doc_stride=128 \ --output_dir=gs://{bucket_name}/squad_base/ \ --use_tpu=True \ --tpu_name=ai

or replace the last flag either with

--tpu_name=grpc://ai

or

--tpu_name=grpc://{tpu_ip}:8470

I got the error as follow:

INFO:tensorflow:Error recorded from training_loop: Unsuccessful TensorSliceReader constructor: Failed to get matching files on gs://object.propel.ai/bert/uncased_L-12_H-768_A-12/bert_model.ckpt: Permission denied: Error executing an HTTP request: HTTP response code 403 with body '{ "error": { "errors": [ { "domain": "global", "reason": "forbidden", "message": "my_account_email does not have storage.objects.list access to object.propel.ai." } ], "code": 403, "message": "my_account_email does not have storage.objects.list access to object.propel.ai." } } ' when reading gs://object.propel.ai/bert/uncased_L-12_H-768_A-12 [[{{node checkpoint_initializer_139}} = RestoreV2[dtypes=[DT_FLOAT], _device="/job:worker/replica:0/task:0/device:CPU:0"](checkpoint_initializer/prefix, checkpoint_initializer_139/tensor_names, checkpoint_initializer/shape_and_slices)]]

Caused by op u'checkpoint_initializer_139', defined at: File "/home/schen/bert/run_squad.py", line 1283, in tf.app.run() File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/home/schen/bert/run_squad.py", line 1215, in main estimator.train(input_fn=train_input_fn, max_steps=num_train_steps) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2394, in train saving_listeners=saving_listeners File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1211, in _train_model_default features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2186, in _call_model_fn features, labels, mode, config) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1169, in _call_model_fn model_fn_results = self._model_fn(features=features, kwargs) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2493, in _model_fn _train_on_tpu_system(ctx, model_fn_wrapper, dequeue_fn)) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2821, in _train_on_tpu_system scaffold = _get_scaffold(captured_scaffold_fn) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2944, in _get_scaffold scaffold = scaffold_fn() File "/home/schen/bert/run_squad.py", line 627, in tpu_scaffold tf.train.init_from_checkpoint(init_checkpoint, assignment_map) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 187, in init_from_checkpoint _init_from_checkpoint, ckpt_dir_or_file, assignment_map) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/distribute.py", line 1040, in merge_call return self._merge_call(merge_fn, *args, *kwargs) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/distribute.py", line 1048, in _merge_call return merge_fn(self._distribution_strategy, args, kwargs) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 231, in _init_from_checkpoint _set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 355, in _set_variable_or_list_initializer _set_checkpoint_initializer(variable_or_list, ckpt_file, tensor_name, "") File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 309, in _set_checkpoint_initializer ckpt_file, [tensor_name], [slice_spec], [base_type], name=name)[0] File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op op_def=op_def) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1768, in init self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to get matching files on gs://object.propel.ai/bert/uncased_L-12_H-768_A-12/bert_model.ckpt: Permission denied: Error executing an HTTP request: HTTP response code 403 with body '{ "error": { "errors": [ { "domain": "global", "reason": "forbidden", "message": "my_account_email does not have storage.objects.list access to object.propel.ai." } ], "code": 403, "message": "my_account_email does not have storage.objects.list access to object.propel.ai." } } ' when reading gs://object.propel.ai/bert/uncased_L-12_H-768_A-12 [[{{node checkpoint_initializer_139}} = RestoreV2[dtypes=[DT_FLOAT], _device="/job:worker/replica:0/task:0/device:CPU:0"](checkpoint_initializer/prefix, checkpoint_initializer_139/tensor_names, checkpoint_initializer/shape_and_slices)]]

INFO:tensorflow:training_loop marked as finished WARNING:tensorflow:Reraising captured error Traceback (most recent call last): File "/home/schen/bert/run_squad.py", line 1283, in tf.app.run() File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/home/schen/bert/run_squad.py", line 1215, in main estimator.train(input_fn=train_input_fn, max_steps=num_train_steps) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2400, in train rendezvous.raise_errors() File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/error_handling.py", line 128, in raise_errors six.reraise(typ, value, traceback) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2394, in train saving_listeners=saving_listeners File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1215, in _train_model_default saving_listeners) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1406, in _train_with_estimator_spec log_step_count_steps=self._config.log_step_count_steps) as mon_sess: File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 504, in MonitoredTrainingSession stop_grace_period_secs=stop_grace_period_secs) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 921, in init stop_grace_period_secs=stop_grace_period_secs) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 643, in init self._sess = _RecoverableSession(self._coordinated_creator) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1107, in init _WrappedSession.init(self, self._create_session()) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1112, in _create_session return self._sess_creator.create_session() File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 800, in create_session self.tf_sess = self._session_creator.create_session() File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 566, in create_session init_fn=self._scaffold.init_fn) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 287, in prepare_session sess.run(init_op, feed_dict=init_feed_dict) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 887, in run run_metadata_ptr) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1110, in _run feed_dict_tensor, options, run_metadata) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1286, in _do_run run_metadata) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1308, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on gs://object.propel.ai/bert/uncased_L-12_H-768_A-12/bert_model.ckpt: Permission denied: Error executing an HTTP request: HTTP response code 403 with body '{ "error": { "errors": [ { "domain": "global", "reason": "forbidden", "message": "my_account_email does not have storage.objects.list access to object.propel.ai." } ], "code": 403, "message": "my_account_email does not have storage.objects.list access to object.propel.ai." } } ' when reading gs://object.propel.ai/bert/uncased_L-12_H-768_A-12 [[{{node checkpoint_initializer_139}} = RestoreV2[dtypes=[DT_FLOAT], _device="/job:worker/replica:0/task:0/device:CPU:0"](checkpoint_initializer/prefix, checkpoint_initializer_139/tensor_names, checkpoint_initializer/shape_and_slices)]]

Caused by op u'checkpoint_initializer_139', defined at: File "/home/schen/bert/run_squad.py", line 1283, in tf.app.run() File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/home/schen/bert/run_squad.py", line 1215, in main estimator.train(input_fn=train_input_fn, max_steps=num_train_steps) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2394, in train saving_listeners=saving_listeners File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1211, in _train_model_default features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2186, in _call_model_fn features, labels, mode, config) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1169, in _call_model_fn model_fn_results = self._model_fn(features=features, kwargs) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2493, in _model_fn _train_on_tpu_system(ctx, model_fn_wrapper, dequeue_fn)) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2821, in _train_on_tpu_system scaffold = _get_scaffold(captured_scaffold_fn) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2944, in _get_scaffold scaffold = scaffold_fn() File "/home/schen/bert/run_squad.py", line 627, in tpu_scaffold tf.train.init_from_checkpoint(init_checkpoint, assignment_map) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 187, in init_from_checkpoint _init_from_checkpoint, ckpt_dir_or_file, assignment_map) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/distribute.py", line 1040, in merge_call return self._merge_call(merge_fn, *args, *kwargs) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/distribute.py", line 1048, in _merge_call return merge_fn(self._distribution_strategy, args, kwargs) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 231, in _init_from_checkpoint _set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 355, in _set_variable_or_list_initializer _set_checkpoint_initializer(variable_or_list, ckpt_file, tensor_name, "") File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 309, in _set_checkpoint_initializer ckpt_file, [tensor_name], [slice_spec], [base_type], name=name)[0] File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op op_def=op_def) File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1768, in init self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to get matching files on gs://object.propel.ai/bert/uncased_L-12_H-768_A-12/bert_model.ckpt: Permission denied: Error executing an HTTP request: HTTP response code 403 with body '{ "error": { "errors": [ { "domain": "global", "reason": "forbidden", "message": "my_account_email does not have storage.objects.list access to object.propel.ai." } ], "code": 403, "message": "my_account_email does not have storage.objects.list access to object.propel.ai." } } ' when reading gs://object.propel.ai/bert/uncased_L-12_H-768_A-12 [[{{node checkpoint_initializer_139}} = RestoreV2[dtypes=[DT_FLOAT], _device="/job:worker/replica:0/task:0/device:CPU:0"](checkpoint_initializer/prefix, checkpoint_initializer_139/tensor_names, checkpoint_initializer/shape_and_slices)]]

System information

What is the top-level directory of the model you are using: google-research/bert

Here is the link: https://github.com/google-research/bert

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No OS Platform and Distribution (e.g., Linux Ubuntu 16.04): my laptop is Mac OS High Sierra (version 10.13.6). The VM instance is Linux 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_6 TensorFlow installed from (source or binary): python -m pip install tensorflow=1.11 TensorFlow version (use command below): 1.11.0 after runing python -c "import tensorflow as tf; print(tf.version)"

If I removed the last two flags and not ran on TPU it worked properly. However, I really want to utilize TPU to speed up the computation.I have stuck on this TPU issue for a long time. When I ran another demo code bert/run_classifier.py I got the same error. It's really frustrating. Any help would be appreciated!

chenshaolong commented 5 years ago

Getting the admin to authorize permissions for both my VM account and TPU account solved the issue.

damnko commented 5 years ago

@webstruck So I cannot reference the bert model with gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12 ?

The error I am having is

tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on gs://bert_models/2018_10_18/un
cased_L-12_H-768_A-12/bert_model.ckpt: Permission denied: Error executing an HTTP request: HTTP response code 403 with body '{
 "error": {
  "errors": [
   {
    "domain": "global",
    "reason": "forbidden",
    "message": "519163749326-compute@developer.gserviceaccount.com does not have storage.objects.list access to bert_models."
   }
  ],
  "code": 403,
  "message": "519163749326-compute@developer.gserviceaccount.com does not have storage.objects.list access to bert_models."
 }
}
'
         when reading gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12

which is generated by

export BERT_BASE_DIR=gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12
export SQUAD_11_EN_DIR=gs://<my_bucket>/squad1.1
export TPU_NAME=<my_tpu>

python run_squad.py \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --do_train=True \
  --train_file=$SQUAD_11_EN_DIR/train-v1.1.json \
  --do_predict=True \
  --predict_file=$SQUAD_11_EN_DIR/dev-v1.1.json \
  --train_batch_size=8 \
  --learning_rate=3e-5 \
  --num_train_epochs=2.0 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=gs://bert_deep_finder/output/ \
  --use_tpu=True \
  --tpu_name=$TPU_NAME

So, should I upload the model into my bucket? I cannot use the one in gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12?

Thanks

chenshaolong commented 5 years ago

Yes, download the model and then upload into your google cloud storage bucket, set the path as environment variable or just use the absolute path.

On Mar 15, 2019, at 3:53 PM, Yari notifications@github.com wrote:

@webstruck https://github.com/webstruck So I cannot reference the bert model with gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12 ?

The error I am having is

tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on gs://bert_models/2018_10_18/un cased_L-12_H-768_A-12/bert_model.ckpt: Permission denied: Error executing an HTTP request: HTTP response code 403 with body '{ "error": { "errors": [ { "domain": "global", "reason": "forbidden", "message": "519163749326-compute@developer.gserviceaccount.com does not have storage.objects.list access to bert_models." } ], "code": 403, "message": "519163749326-compute@developer.gserviceaccount.com does not have storage.objects.list access to bert_models." } } ' when reading gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12

which is generated by

export BERT_BASE_DIR=gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12 export SQUAD_11_EN_DIR=gs:///squad1.1 export TPU_NAME=

python run_squad.py \ --vocab_file=$BERT_BASE_DIR/vocab.txt \ --bert_config_file=$BERT_BASE_DIR/bert_config.json \ --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \ --do_train=True \ --train_file=$SQUAD_11_EN_DIR/train-v1.1.json \ --do_predict=True \ --predict_file=$SQUAD_11_EN_DIR/dev-v1.1.json \ --train_batch_size=8 \ --learning_rate=3e-5 \ --num_train_epochs=2.0 \ --max_seq_length=384 \ --doc_stride=128 \ --output_dir=gs://bert_deep_finder/output/ \ --use_tpu=True \ --tpu_name=$TPU_NAME So, should I upload the model into my bucket? I cannot use the one in gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12?

Thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google-research/bert/issues/98#issuecomment-473465748, or mute the thread https://github.com/notifications/unsubscribe-auth/AUGROASzpjAGRvguHw5tmhRvUzaZYzVSks5vXCR9gaJpZM4YYYV8.

damnko commented 5 years ago

Sorry for asking, but then how can I use the models that are already online in the bert_models/ storage bucket? I suppose there must be a way since it's mentioned in the Fine-tuning with Cloud TPUs section of the repo.

Edit: Could it be that my Cloud TPU is not in the same region as the bert_models/ bucket?

google-research / bert

Bert_model.ckpt not found with run_squad.py on TPU #98