ConnorJL / GPT2

An implementation of training for GPT2, supports TPUs
MIT License
1.42k stars 334 forks source link

when reading metadata of gs://openwebtext/stuff/encoder/encoder.json #18

Closed makamkkumar closed 4 years ago

makamkkumar commented 4 years ago

Error coming while executing the command

_$ python3 main.py --model 345M.json --predicttext "Hello World. Hello there! My name" The output is below {'n_head': 16, 'encoder_path': 'gs://openwebtext/stuff/encoder', 'n_vocab': 50257, 'embed_dropout': 0.1, 'lr': 0.00025, 'warmup_steps': 2000, 'weight_decay': 0.01, 'beta1': 0.9, 'beta2': 0.98, 'epsilon': 1e-09, 'opt_name': 'adam', 'train_batch_size': 8, 'attn_dropout': 0.1, 'train_steps': 10000, 'eval_steps': 10, 'max_steps': 500000, 'data_path': 'gs://connors-datasets/openwebtext/', 'res_dropout': 0.1, 'predict_batch_size': 8, 'eval_batch_size': 8, 'iterations': 500, 'n_embd': 1024, 'input': 'openwebtext', 'model': 'GPT2', 'model_path': 'gs://connors-models/GPT2-345M', 'n_ctx': 1024, 'predict_path': 'logs/predictions.txt', 'n_layer': 24, 'scale_by_depth': True, 'scale_by_in': True, 'use_tpu': False, 'precision': 'float32'} 2019-10-21 12:38:38.103626: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.159809 seconds (attempt 1 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata' 2019-10-21 12:38:38.272828: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.053047 seconds (attempt 2 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata' 2019-10-21 12:38:38.370688: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.050504 seconds (attempt 3 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata' 2019-10-21 12:38:38.433094: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.564422 seconds (attempt 4 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata' 2019-10-21 12:38:39.022315: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.256678 seconds (attempt 5 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata' 2019-10-21 12:38:39.300586: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 1.24113 seconds (attempt 6 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata' 2019-10-21 12:38:40.675821: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 1.13431 seconds (attempt 7 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata' 2019-10-21 12:38:41.867547: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 1.20263 seconds (attempt 8 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata' 2019-10-21 12:38:43.087045: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 1.05564 seconds (attempt 9 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata' 2019-10-21 12:38:44.151391: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 1.43831 seconds (attempt 10 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata' 2019-10-21 12:38:45.596157: W tensorflow/core/platform/cloud/google_auth_provider.cc:157] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Aborted: All 10 retry attempts failed. The last failure: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'". Traceback (most recent call last): File "main.py", line 118, in enc = encoder.get_encoder(params["encoder_path"]) File "/home/kiran1/KiranResearch/TextSummerization/GPT2/models/gpt2/encoder.py", line 111, in get_encoder encoder = json.load(f) File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/json/init.py", line 296, in load return loads(fp.read(), File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 128, in read length = self.size() - self.tell() File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 104, in size return stat(self.__name).length File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 735, in stat return stat_v2(filename) File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 754, in stat_v2 return file_statistics File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.PermissionDeniedError: Error executing an HTTP request: HTTP response code 401 with body '{ "error": { "code": 401, "message": "Anonymous caller does not have storage.objects.get access to openwebtext/stuff/encoder/encoder.json.", "errors": [ { "message": "Anonymous caller does not have storage.objects.get access to openwebtext/stuff/encoder/encoder.json.", "domain": "global", "reason": "required", "locationType": "header", "location": "Authorization" } ] } } ' when reading metadata of gs://openwebtext/stuff/encoder/encoder.json

ConnorJL commented 4 years ago

The paths are still pointing towards my (private) google bucket. You need to download the encoder/model and put them somewhere you have access to.