Closed steverichey closed 4 years ago
I'm getting a possibly related error on the next step as well, if it helps.
❯ sh language/realm/local_launcher.sh gen-train
INFO:tensorflow:Create preprocessing server at [::]:8888
I0825 11:36:44.796554 4509986240 preprocessing.py:71] Create preprocessing server at [::]:8888
2020-08-25 11:36:54.371744: W tensorflow/core/platform/cloud/google_auth_provider.cc:178] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Aborted: All 10 retry attempts failed. The last failure: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'".
I0825 11:36:54.873995 4509986240 example_generator.py:365] Loaded featurizer.
2020-08-25 11:36:54.874521: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2020-08-25 11:36:54.874891: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 12. Tune using inter_op_parallelism_threads for best performance.
WARNING:tensorflow:From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
W0825 11:36:55.015143 4509986240 deprecation.py:323] From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
W0825 11:36:55.016097 4509986240 deprecation.py:323] From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
W0825 11:36:55.017080 4509986240 deprecation.py:323] From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
W0825 11:36:55.018254 4509986240 deprecation.py:323] From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
W0825 11:36:55.018979 4509986240 deprecation.py:323] From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
W0825 11:36:55.021220 4509986240 deprecation.py:323] From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
W0825 11:36:55.021611 4509986240 deprecation.py:323] From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
W0825 11:36:55.023555 4509986240 deprecation.py:323] From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
W0825 11:36:55.024527 4509986240 deprecation.py:323] From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
WARNING:tensorflow:From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
W0825 11:36:55.025415 4509986240 deprecation.py:323] From [redacted]/realm/language/realm/retrieval.py:188: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
`tf.data.TFRecordDataset(path)`
I0825 11:36:56.178431 4509986240 retrieval.py:241] Loaded 1 of 10 document shards.
I0825 11:36:56.268060 4509986240 retrieval.py:241] Loaded 2 of 10 document shards.
I0825 11:36:56.355141 4509986240 retrieval.py:241] Loaded 3 of 10 document shards.
I0825 11:36:56.445337 4509986240 retrieval.py:241] Loaded 4 of 10 document shards.
I0825 11:36:56.450142 4509986240 retrieval.py:241] Loaded 5 of 10 document shards.
I0825 11:36:56.661361 4509986240 retrieval.py:241] Loaded 6 of 10 document shards.
I0825 11:36:56.671973 4509986240 retrieval.py:241] Loaded 7 of 10 document shards.
I0825 11:36:56.719235 4509986240 retrieval.py:241] Loaded 8 of 10 document shards.
I0825 11:36:56.965728 4509986240 retrieval.py:241] Loaded 9 of 10 document shards.
I0825 11:36:57.381914 4509986240 retrieval.py:241] Loaded 10 of 10 document shards.
I0825 11:36:57.382042 4509986240 retrieval.py:246] Combining data from all document shards.
I0825 11:36:57.382192 4509986240 retrieval.py:251] Finished loading all shards.
WARNING:tensorflow:From [redacted]/opt/anaconda3/envs/realm/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1781: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0825 11:36:58.936383 4509986240 deprecation.py:506] From [redacted]/opt/anaconda3/envs/realm/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1781: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
I0825 11:37:01.653182 4509986240 retrieval.py:265] Loaded query embedder.
I0825 11:37:02.930503 4509986240 retrieval.py:107] Loaded document embeddings.
I0825 11:37:02.939476 4509986240 example_generator.py:379] Loaded retriever from gs://realm-data/realm-data-small/small-ict
INFO:tensorflow:Reading data from these shards:
I0825 11:37:03.050762 4509986240 example_generator.py:326] Reading data from these shards:
INFO:tensorflow:gs://realm-data/realm-data-small/pretrain_corpus_small/wikipedia_annotated_with_dates_public-00016-of-00020.tfrecord.gz
I0825 11:37:03.050952 4509986240 example_generator.py:328] gs://realm-data/realm-data-small/pretrain_corpus_small/wikipedia_annotated_with_dates_public-00016-of-00020.tfrecord.gz
WARNING:tensorflow:From [redacted]/realm/language/realm/example_generator.py:343: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0825 11:37:03.052146 4509986240 deprecation.py:323] From [redacted]/realm/language/realm/example_generator.py:343: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
OMP: Error #15: Initializing libiomp5.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
Fatal Python error: Aborted
Thread 0x0000700005f0f000 (most recent call first):
File "[redacted]/opt/anaconda3/envs/realm/lib/python3.7/site-packages/grpc/_server.py", line 871 in _serve
File "[redacted]/opt/anaconda3/envs/realm/lib/python3.7/threading.py", line 870 in run
File "[redacted]/opt/anaconda3/envs/realm/lib/python3.7/threading.py", line 926 in _bootstrap_inner
File "[redacted]/opt/anaconda3/envs/realm/lib/python3.7/threading.py", line 890 in _bootstrap
Thread 0x000000010cd0edc0 (most recent call first):
File "[redacted]/opt/anaconda3/envs/realm/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 61 in quick_execute
File "[redacted]/opt/anaconda3/envs/realm/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 511 in call
File "[redacted]/opt/anaconda3/envs/realm/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1224 in _call_flat
File "[redacted]/opt/anaconda3/envs/realm/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1121 in _call_impl
File "[redacted]/opt/anaconda3/envs/realm/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1081 in __call__
File "[redacted]/realm/language/realm/retrieval.py", line 288 in embed
File "[redacted]/realm/language/realm/retrieval.py", line 116 in retrieve
File "[redacted]/realm/language/realm/profile.py", line 40 in wrapped
File "[redacted]/realm/language/realm/example_generator.py", line 144 in generate_queries_and_candidates
File "[redacted]/realm/language/realm/example_generator.py", line 172 in generate_realm_examples
File "[redacted]/realm/language/realm/example_generator.py", line 423 in generate_realm_examples_with_model_refresh
File "[redacted]/realm/language/realm/example_generator.py", line 455 in <genexpr>
File "[redacted]/realm/language/realm/preprocessing.py", line 91 in push_examples
File "[redacted]/realm/language/realm/example_generator.py", line 458 in main
File "[redacted]/opt/anaconda3/envs/realm/lib/python3.7/site-packages/absl/app.py", line 250 in _run_main
File "[redacted]/opt/anaconda3/envs/realm/lib/python3.7/site-packages/absl/app.py", line 299 in run
File "[redacted]/realm/language/realm/example_generator.py", line 463 in run_main
File "[redacted]/realm/language/realm/example_generator.py", line 468 in <module>
File "[redacted]/opt/anaconda3/envs/realm/lib/python3.7/runpy.py", line 85 in _run_code
File "[redacted]/opt/anaconda3/envs/realm/lib/python3.7/runpy.py", line 193 in _run_module_as_main
language/realm/local_launcher.sh: line 72: 17768 Abort trap: 6 python -m language.realm.example_generator --vocab_path="${VOCAB_PATH}" --do_lower_case --query_seq_len=96 --candidate_seq_len=288 --num_candidates=8 --max_masks=10 --num_shards_per_mips_refresh=1 --pretrain_corpus_path="${PRETRAIN_CORPUS_PATH}" --retrieval_corpus_path="${RETRIEVAL_CORPUS_PATH}" --initial_embedder_module="${ICT_MODULE_PATH}" --model_dir="${MODEL_DIR}" $extra_flags
Hi! Do the printouts from the log files (out/log/
) also get scrambled? If not, could you re-paste the log for the refresh
run?
The error message in the second run might be an issue specific to MacOS + Conda (https://stackoverflow.com/q/53014306). Could you try following the instructions there?
Thank you for your quick response. Sadly, the log files are also scrambled.
I did conda install nomkl
per the link and that did seem to resolve the issue I was having with that step.
Hi @steverichey , the scrambled text is caused by usage of the multiprocessing library: our code launches multiple Python processes to read in the data faster -- when they fail, their STDERR messages get interleaved.
As for the actual cause of the error, it does seem like an authentication issue. As a sanity check, could you use gsutil
to download the DATA_DIR
specified in local_launcher.sh
to a local directory, and point at that instead?
Yep, I've seen similar issues with multiprocessing in the past. Fortunately, using the local files did seem to help. I had to delete a temporary folder to get one of the local_launcher.sh
calls to work, but I do believe I'm currently training. Thanks all!
I'm using a recent MacBook Pro and have followed the instructions to use a conda environment. However, I get the following errors when I attempt to start the document index refresher (step 1 in Running the code in the REALM readme). I've included the gcloud login line to show that I am logged in, despite the "All attempts to get a Google authentication bearer token failed" message. I'd imagine that the TensorFlow warnings are just because this project uses an older version of TensorFlow, but they are included for completeness' sake. Note that the text becomes scrambled at a certain point; I have no idea what that means. I am using Apple zsh in macOS Catalina (10.15.6).
Any assistance is greatly appreciated. I would like to try REALM locally before we spin up expensive GPU resources to perform full training.