google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Apache License 2.0
2.31k stars 351 forks source link

problem encountered in reproducing Electra-Large #63

Closed spectrometerH closed 4 years ago

spectrometerH commented 4 years ago

I use

python3 run_finetuning.py --data-dir ~ --model-name electra_large --hparams '{"model_size": "large", "task_names": ["cola"], "num_train_epochs": 10, "do_train": true, "do_eval": true, "write_test_outputs": true}'

to reproduce Electra-Large's result on GLEU. But the loss is always around 20 during fine-tuning.

But when I try Electra-Base using

python3 run_finetuning.py --data-dir ~ --model-name electra_base --hparams '{"model_size": "base", "task_names": ["cola"], "num_train_epochs": 3, "do_train": true, "do_eval": true, "write_test_outputs": true}'

the loss decreased normally as the result is OK.

The pre-train models are all downloaded from released models.

The output is attached below for your interest

================================================================================
Config: model=electra_large, trial 1/1
================================================================================
answerable_classifier True
answerable_uses_start_logits True
answerable_weight 0.5
beam_size 20
data_dir /home/spectrometer
debug False
do_eval True
do_lower_case True
do_train True
doc_stride 128
double_unordered True
embedding_size None
eval_batch_size 32
gcp_project None
init_checkpoint /home/spectrometer/models/electra_large
iterations_per_loop 1000
joint_prediction True
keep_all_models True
layerwise_lr_decay 0.9
learning_rate 5e-05
log_examples False
max_answer_length 30
max_query_length 64
max_seq_length 128
model_dir /home/spectrometer/models/electra_large/finetuning_models/cola_model
model_hparam_overrides {}
model_name electra_large
model_size large
n_best_size 20
n_writes_test 5
num_tpu_cores 1
num_train_epochs 10
num_trials 1
predict_batch_size 32
preprocessed_data_dir /home/spectrometer/models/electra_large/finetuning_tfrecords/cola_tfrecords
qa_eval_file <built-in method format of str object at 0x7f48d7b3f4e0>
qa_na_file <built-in method format of str object at 0x7f48d7ba4730>
qa_na_threshold -2.75
qa_preds_file <built-in method format of str object at 0x7f48d7b3f558>
raw_data_dir <built-in method format of str object at 0x7f48d7b30ce8>
results_pkl /home/spectrometer/models/electra_large/results/cola_results.pkl
results_txt /home/spectrometer/models/electra_large/results/cola_results.txt
save_checkpoints_steps 1000000
task_names ['cola']
test_predictions <built-in method format of str object at 0x7f48d7b431c8>
tpu_job_name None
tpu_name None
tpu_zone None
train_batch_size 32
use_tfrecords_if_existing True
use_tpu False
vocab_file /home/spectrometer/models/electra_large/vocab.txt
vocab_size 30522
warmup_proportion 0.1
weight_decay_rate 0.01
write_test_outputs True

Loading dataset cola_train
================================================================================
Start training: model=electra_large, trial 1/1
================================================================================
Training for 2680 steps
Building model...
Building complete
2020-05-22 10:35:20.577095: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-05-22 10:35:20.598852: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-05-22 10:35:22.827953: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2d51cd0 executing computations on platform CUDA. Devices:
2020-05-22 10:35:22.828024: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): TITAN V, Compute Capability 7.0
2020-05-22 10:35:22.828041: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (1): TITAN V, Compute Capability 7.0
2020-05-22 10:35:22.828061: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (2): TITAN V, Compute Capability 7.0
2020-05-22 10:35:22.828074: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (3): TITAN V, Compute Capability 7.0
2020-05-22 10:35:22.840141: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2099980000 Hz
2020-05-22 10:35:22.844202: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2d52030 executing computations on platform Host. Devices:
2020-05-22 10:35:22.844245: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2020-05-22 10:35:22.850947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:02:00.0
2020-05-22 10:35:22.852163: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:03:00.0
2020-05-22 10:35:22.858713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 2 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:82:00.0
2020-05-22 10:35:22.859862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 3 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:83:00.0
2020-05-22 10:35:22.860124: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 10:35:22.860266: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 10:35:22.860394: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 10:35:22.860517: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 10:35:22.860638: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 10:35:22.860758: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 10:35:22.868326: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-05-22 10:35:22.868366: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-05-22 10:35:22.868605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-22 10:35:22.868628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 1 2 3
2020-05-22 10:35:22.868642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N Y N N
2020-05-22 10:35:22.868654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1:   Y N N N
2020-05-22 10:35:22.868666: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 2:   N N N Y
2020-05-22 10:35:22.868677: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 3:   N N Y N
2020-05-22 10:35:29.430128: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
10/2680 = 0.4%, SPS: 0.1, ELAP: 2:28, ETA: 10:58:13 - loss: 22.3920
20/2680 = 0.7%, SPS: 0.1, ELAP: 4:29, ETA: 9:55:29 - loss: 11.8185
30/2680 = 1.1%, SPS: 0.1, ELAP: 6:23, ETA: 9:23:14 - loss: 17.4742
40/2680 = 1.5%, SPS: 0.1, ELAP: 8:20, ETA: 9:10:15 - loss: 18.5379
50/2680 = 1.9%, SPS: 0.1, ELAP: 10:14, ETA: 8:58:32 - loss: 21.8179
60/2680 = 2.2%, SPS: 0.1, ELAP: 12:07, ETA: 8:49:13 - loss: 17.5961
70/2680 = 2.6%, SPS: 0.1, ELAP: 14:02, ETA: 8:43:21 - loss: 28.3880
80/2680 = 3.0%, SPS: 0.1, ELAP: 16:00, ETA: 8:39:55 - loss: 24.6309
90/2680 = 3.4%, SPS: 0.1, ELAP: 17:52, ETA: 8:34:16 - loss: 25.7210
100/2680 = 3.7%, SPS: 0.1, ELAP: 19:46, ETA: 8:30:07 - loss: 16.6593
110/2680 = 4.1%, SPS: 0.1, ELAP: 21:41, ETA: 8:26:32 - loss: 20.7071
120/2680 = 4.5%, SPS: 0.1, ELAP: 23:38, ETA: 8:24:17 - loss: 18.8301
130/2680 = 4.9%, SPS: 0.1, ELAP: 25:33, ETA: 8:21:19 - loss: 18.7853
140/2680 = 5.2%, SPS: 0.1, ELAP: 27:28, ETA: 8:18:13 - loss: 19.0647
150/2680 = 5.6%, SPS: 0.1, ELAP: 29:20, ETA: 8:14:44 - loss: 20.5842
160/2680 = 6.0%, SPS: 0.1, ELAP: 31:14, ETA: 8:12:00 - loss: 19.2305
170/2680 = 6.3%, SPS: 0.1, ELAP: 33:09, ETA: 8:09:31 - loss: 21.2707
180/2680 = 6.7%, SPS: 0.1, ELAP: 35:00, ETA: 8:06:03 - loss: 21.4293
190/2680 = 7.1%, SPS: 0.1, ELAP: 36:54, ETA: 8:03:30 - loss: 22.0171
200/2680 = 7.5%, SPS: 0.1, ELAP: 38:49, ETA: 8:01:23 - loss: 18.9573
210/2680 = 7.8%, SPS: 0.1, ELAP: 40:42, ETA: 7:58:48 - loss: 25.9965
220/2680 = 8.2%, SPS: 0.1, ELAP: 42:38, ETA: 7:56:41 - loss: 19.1371
230/2680 = 8.6%, SPS: 0.1, ELAP: 44:33, ETA: 7:54:29 - loss: 19.1548
240/2680 = 9.0%, SPS: 0.1, ELAP: 46:27, ETA: 7:52:17 - loss: 20.4866
250/2680 = 9.3%, SPS: 0.1, ELAP: 48:17, ETA: 7:49:23 - loss: 15.1779
260/2680 = 9.7%, SPS: 0.1, ELAP: 50:11, ETA: 7:47:10 - loss: 22.7514
270/2680 = 10.1%, SPS: 0.1, ELAP: 52:02, ETA: 7:44:26 - loss: 15.4121
280/2680 = 10.4%, SPS: 0.1, ELAP: 53:54, ETA: 7:41:59 - loss: 22.1511
290/2680 = 10.8%, SPS: 0.1, ELAP: 55:46, ETA: 7:39:38 - loss: 22.4470
300/2680 = 11.2%, SPS: 0.1, ELAP: 57:39, ETA: 7:37:22 - loss: 18.6222
310/2680 = 11.6%, SPS: 0.1, ELAP: 59:30, ETA: 7:34:57 - loss: 21.4283
320/2680 = 11.9%, SPS: 0.1, ELAP: 1:01:25, ETA: 7:32:54 - loss: 20.4210
330/2680 = 12.3%, SPS: 0.1, ELAP: 1:03:18, ETA: 7:30:46 - loss: 20.3211
340/2680 = 12.7%, SPS: 0.1, ELAP: 1:05:09, ETA: 7:28:25 - loss: 18.8367
350/2680 = 13.1%, SPS: 0.1, ELAP: 1:07:01, ETA: 7:26:07 - loss: 18.8168
360/2680 = 13.4%, SPS: 0.1, ELAP: 1:08:51, ETA: 7:23:43 - loss: 18.3760
370/2680 = 13.8%, SPS: 0.1, ELAP: 1:10:43, ETA: 7:21:29 - loss: 14.4103
380/2680 = 14.2%, SPS: 0.1, ELAP: 1:12:36, ETA: 7:19:26 - loss: 16.8776
390/2680 = 14.6%, SPS: 0.1, ELAP: 1:14:27, ETA: 7:17:08 - loss: 19.5334
400/2680 = 14.9%, SPS: 0.1, ELAP: 1:16:19, ETA: 7:14:58 - loss: 16.8220
410/2680 = 15.3%, SPS: 0.1, ELAP: 1:18:09, ETA: 7:12:39 - loss: 20.9394
420/2680 = 15.7%, SPS: 0.1, ELAP: 1:20:00, ETA: 7:10:29 - loss: 20.4125
430/2680 = 16.0%, SPS: 0.1, ELAP: 1:21:52, ETA: 7:08:21 - loss: 24.1731
440/2680 = 16.4%, SPS: 0.1, ELAP: 1:23:42, ETA: 7:06:09 - loss: 16.3555
450/2680 = 16.8%, SPS: 0.1, ELAP: 1:25:34, ETA: 7:04:04 - loss: 17.3892
460/2680 = 17.2%, SPS: 0.1, ELAP: 1:27:27, ETA: 7:02:01 - loss: 23.5229
470/2680 = 17.5%, SPS: 0.1, ELAP: 1:29:19, ETA: 6:59:57 - loss: 13.1339
480/2680 = 17.9%, SPS: 0.1, ELAP: 1:31:11, ETA: 6:57:54 - loss: 20.4540
490/2680 = 18.3%, SPS: 0.1, ELAP: 1:33:04, ETA: 6:55:57 - loss: 20.0320
500/2680 = 18.7%, SPS: 0.1, ELAP: 1:34:56, ETA: 6:53:54 - loss: 19.3428
510/2680 = 19.0%, SPS: 0.1, ELAP: 1:36:47, ETA: 6:51:50 - loss: 19.0736
520/2680 = 19.4%, SPS: 0.1, ELAP: 1:38:38, ETA: 6:49:44 - loss: 24.2149
530/2680 = 19.8%, SPS: 0.1, ELAP: 1:40:30, ETA: 6:47:42 - loss: 20.3069
540/2680 = 20.1%, SPS: 0.1, ELAP: 1:42:24, ETA: 6:45:47 - loss: 16.3973
550/2680 = 20.5%, SPS: 0.1, ELAP: 1:44:15, ETA: 6:43:45 - loss: 22.4947
560/2680 = 20.9%, SPS: 0.1, ELAP: 1:46:06, ETA: 6:41:41 - loss: 21.7756
570/2680 = 21.3%, SPS: 0.1, ELAP: 1:47:58, ETA: 6:39:39 - loss: 18.6298
580/2680 = 21.6%, SPS: 0.1, ELAP: 1:49:49, ETA: 6:37:38 - loss: 20.8251
590/2680 = 22.0%, SPS: 0.1, ELAP: 1:51:41, ETA: 6:35:36 - loss: 18.4585
600/2680 = 22.4%, SPS: 0.1, ELAP: 1:53:33, ETA: 6:33:37 - loss: 20.3935
610/2680 = 22.8%, SPS: 0.1, ELAP: 1:55:24, ETA: 6:31:35 - loss: 23.2414
620/2680 = 23.1%, SPS: 0.1, ELAP: 1:57:15, ETA: 6:29:34 - loss: 20.5670
630/2680 = 23.5%, SPS: 0.1, ELAP: 1:59:06, ETA: 6:27:33 - loss: 21.1991
640/2680 = 23.9%, SPS: 0.1, ELAP: 2:00:57, ETA: 6:25:31 - loss: 19.3440
650/2680 = 24.3%, SPS: 0.1, ELAP: 2:02:48, ETA: 6:23:31 - loss: 19.9161
660/2680 = 24.6%, SPS: 0.1, ELAP: 2:04:38, ETA: 6:21:27 - loss: 17.1423
670/2680 = 25.0%, SPS: 0.1, ELAP: 2:06:28, ETA: 6:19:25 - loss: 20.1901
680/2680 = 25.4%, SPS: 0.1, ELAP: 2:08:19, ETA: 6:17:24 - loss: 19.0718
690/2680 = 25.7%, SPS: 0.1, ELAP: 2:10:10, ETA: 6:15:24 - loss: 22.3521
700/2680 = 26.1%, SPS: 0.1, ELAP: 2:12:01, ETA: 6:13:24 - loss: 17.8936
710/2680 = 26.5%, SPS: 0.1, ELAP: 2:13:52, ETA: 6:11:27 - loss: 19.6297
720/2680 = 26.9%, SPS: 0.1, ELAP: 2:15:42, ETA: 6:09:26 - loss: 19.9130
730/2680 = 27.2%, SPS: 0.1, ELAP: 2:17:33, ETA: 6:07:26 - loss: 20.9930
740/2680 = 27.6%, SPS: 0.1, ELAP: 2:19:23, ETA: 6:05:24 - loss: 20.1684
750/2680 = 28.0%, SPS: 0.1, ELAP: 2:21:15, ETA: 6:03:28 - loss: 19.4356
760/2680 = 28.4%, SPS: 0.1, ELAP: 2:23:05, ETA: 6:01:30 - loss: 20.4887
770/2680 = 28.7%, SPS: 0.1, ELAP: 2:24:57, ETA: 5:59:33 - loss: 24.3556
780/2680 = 29.1%, SPS: 0.1, ELAP: 2:26:47, ETA: 5:57:33 - loss: 19.5041
790/2680 = 29.5%, SPS: 0.1, ELAP: 2:28:38, ETA: 5:55:36 - loss: 23.0490
800/2680 = 29.9%, SPS: 0.1, ELAP: 2:30:28, ETA: 5:53:36 - loss: 18.2448
810/2680 = 30.2%, SPS: 0.1, ELAP: 2:32:20, ETA: 5:51:40 - loss: 17.5824
820/2680 = 30.6%, SPS: 0.1, ELAP: 2:34:10, ETA: 5:49:43 - loss: 20.4027
830/2680 = 31.0%, SPS: 0.1, ELAP: 2:36:01, ETA: 5:47:46 - loss: 20.9864
840/2680 = 31.3%, SPS: 0.1, ELAP: 2:37:52, ETA: 5:45:49 - loss: 18.1183
850/2680 = 31.7%, SPS: 0.1, ELAP: 2:39:43, ETA: 5:43:52 - loss: 19.8767
860/2680 = 32.1%, SPS: 0.1, ELAP: 2:41:35, ETA: 5:41:57 - loss: 19.0511
870/2680 = 32.5%, SPS: 0.1, ELAP: 2:43:25, ETA: 5:40:00 - loss: 21.5915
880/2680 = 32.8%, SPS: 0.1, ELAP: 2:45:18, ETA: 5:38:06 - loss: 24.4359
890/2680 = 33.2%, SPS: 0.1, ELAP: 2:47:09, ETA: 5:36:10 - loss: 19.6625
900/2680 = 33.6%, SPS: 0.1, ELAP: 2:48:59, ETA: 5:34:13 - loss: 19.2670
910/2680 = 34.0%, SPS: 0.1, ELAP: 2:50:50, ETA: 5:32:16 - loss: 20.3501
920/2680 = 34.3%, SPS: 0.1, ELAP: 2:52:41, ETA: 5:30:20 - loss: 15.8821
930/2680 = 34.7%, SPS: 0.1, ELAP: 2:54:32, ETA: 5:28:26 - loss: 22.1362
940/2680 = 35.1%, SPS: 0.1, ELAP: 2:56:24, ETA: 5:26:31 - loss: 20.8700
950/2680 = 35.4%, SPS: 0.1, ELAP: 2:58:15, ETA: 5:24:36 - loss: 18.1807
960/2680 = 35.8%, SPS: 0.1, ELAP: 3:00:07, ETA: 5:22:43 - loss: 18.3234
970/2680 = 36.2%, SPS: 0.1, ELAP: 3:01:58, ETA: 5:20:48 - loss: 19.3116
980/2680 = 36.6%, SPS: 0.1, ELAP: 3:03:49, ETA: 5:18:52 - loss: 19.5240
990/2680 = 36.9%, SPS: 0.1, ELAP: 3:05:40, ETA: 5:16:57 - loss: 18.6035
1000/2680 = 37.3%, SPS: 0.1, ELAP: 3:07:31, ETA: 5:15:01 - loss: 18.5933
1010/2680 = 37.7%, SPS: 0.1, ELAP: 3:09:22, ETA: 5:13:06 - loss: 20.5561
1020/2680 = 38.1%, SPS: 0.1, ELAP: 3:11:13, ETA: 5:11:12 - loss: 18.2277
1030/2680 = 38.4%, SPS: 0.1, ELAP: 3:13:05, ETA: 5:09:19 - loss: 19.4487
1040/2680 = 38.8%, SPS: 0.1, ELAP: 3:14:58, ETA: 5:07:26 - loss: 18.1976
1050/2680 = 39.2%, SPS: 0.1, ELAP: 3:16:50, ETA: 5:05:34 - loss: 22.8400
1060/2680 = 39.6%, SPS: 0.1, ELAP: 3:18:42, ETA: 5:03:40 - loss: 20.7841
1070/2680 = 39.9%, SPS: 0.1, ELAP: 3:20:35, ETA: 5:01:48 - loss: 15.9882
1080/2680 = 40.3%, SPS: 0.1, ELAP: 3:22:28, ETA: 4:59:58 - loss: 21.3980
1090/2680 = 40.7%, SPS: 0.1, ELAP: 3:24:20, ETA: 4:58:04 - loss: 20.7809
1100/2680 = 41.0%, SPS: 0.1, ELAP: 3:26:14, ETA: 4:56:13 - loss: 18.5002
1110/2680 = 41.4%, SPS: 0.1, ELAP: 3:28:06, ETA: 4:54:20 - loss: 21.2391
1120/2680 = 41.8%, SPS: 0.1, ELAP: 3:29:59, ETA: 4:52:29 - loss: 19.3268
1130/2680 = 42.2%, SPS: 0.1, ELAP: 3:31:51, ETA: 4:50:36 - loss: 21.1105
1140/2680 = 42.5%, SPS: 0.1, ELAP: 3:33:44, ETA: 4:48:43 - loss: 16.7126
1150/2680 = 42.9%, SPS: 0.1, ELAP: 3:35:38, ETA: 4:46:53 - loss: 23.4943
1160/2680 = 43.3%, SPS: 0.1, ELAP: 3:37:32, ETA: 4:45:03 - loss: 21.6205
1170/2680 = 43.7%, SPS: 0.1, ELAP: 3:39:24, ETA: 4:43:09 - loss: 18.8343
1180/2680 = 44.0%, SPS: 0.1, ELAP: 3:41:16, ETA: 4:41:16 - loss: 16.1223
1190/2680 = 44.4%, SPS: 0.1, ELAP: 3:43:13, ETA: 4:39:29 - loss: 19.9336
1200/2680 = 44.8%, SPS: 0.1, ELAP: 3:45:05, ETA: 4:37:36 - loss: 21.1273
1210/2680 = 45.1%, SPS: 0.1, ELAP: 3:46:59, ETA: 4:35:45 - loss: 18.0167
1220/2680 = 45.5%, SPS: 0.1, ELAP: 3:48:52, ETA: 4:33:53 - loss: 18.4363
1230/2680 = 45.9%, SPS: 0.1, ELAP: 3:50:45, ETA: 4:32:02 - loss: 17.5395
1240/2680 = 46.3%, SPS: 0.1, ELAP: 3:52:38, ETA: 4:30:10 - loss: 17.4508
1250/2680 = 46.6%, SPS: 0.1, ELAP: 3:54:31, ETA: 4:28:17 - loss: 19.2630
1260/2680 = 47.0%, SPS: 0.1, ELAP: 3:56:24, ETA: 4:26:25 - loss: 20.7043
1270/2680 = 47.4%, SPS: 0.1, ELAP: 3:58:16, ETA: 4:24:32 - loss: 15.8684
1280/2680 = 47.8%, SPS: 0.1, ELAP: 4:00:08, ETA: 4:22:39 - loss: 19.5736
1290/2680 = 48.1%, SPS: 0.1, ELAP: 4:02:03, ETA: 4:20:49 - loss: 20.6141
1300/2680 = 48.5%, SPS: 0.1, ELAP: 4:03:56, ETA: 4:18:57 - loss: 22.3914
1310/2680 = 48.9%, SPS: 0.1, ELAP: 4:05:47, ETA: 4:17:03 - loss: 18.5007
1320/2680 = 49.3%, SPS: 0.1, ELAP: 4:07:40, ETA: 4:15:10 - loss: 15.5616
1330/2680 = 49.6%, SPS: 0.1, ELAP: 4:09:34, ETA: 4:13:19 - loss: 19.3617
1340/2680 = 50.0%, SPS: 0.1, ELAP: 4:11:26, ETA: 4:11:26 - loss: 17.7249
1350/2680 = 50.4%, SPS: 0.1, ELAP: 4:13:20, ETA: 4:09:35 - loss: 22.9627
1360/2680 = 50.7%, SPS: 0.1, ELAP: 4:15:12, ETA: 4:07:42 - loss: 18.1267
1370/2680 = 51.1%, SPS: 0.1, ELAP: 4:17:06, ETA: 4:05:51 - loss: 18.5122
1380/2680 = 51.5%, SPS: 0.1, ELAP: 4:18:59, ETA: 4:03:58 - loss: 20.9085
1390/2680 = 51.9%, SPS: 0.1, ELAP: 4:20:53, ETA: 4:02:07 - loss: 20.7251
1400/2680 = 52.2%, SPS: 0.1, ELAP: 4:22:45, ETA: 4:00:13 - loss: 19.4154
1410/2680 = 52.6%, SPS: 0.1, ELAP: 4:24:38, ETA: 3:58:21 - loss: 22.8371
1420/2680 = 53.0%, SPS: 0.1, ELAP: 4:26:32, ETA: 3:56:30 - loss: 21.9000
1430/2680 = 53.4%, SPS: 0.1, ELAP: 4:28:26, ETA: 3:54:38 - loss: 18.2020
1440/2680 = 53.7%, SPS: 0.1, ELAP: 4:30:19, ETA: 3:52:47 - loss: 18.8780
1450/2680 = 54.1%, SPS: 0.1, ELAP: 4:32:14, ETA: 3:50:56 - loss: 18.1634
1460/2680 = 54.5%, SPS: 0.1, ELAP: 4:34:09, ETA: 3:49:05 - loss: 18.1409
1470/2680 = 54.9%, SPS: 0.1, ELAP: 4:36:03, ETA: 3:47:13 - loss: 16.7001
1480/2680 = 55.2%, SPS: 0.1, ELAP: 4:37:58, ETA: 3:45:23 - loss: 19.2709
1490/2680 = 55.6%, SPS: 0.1, ELAP: 4:39:51, ETA: 3:43:30 - loss: 21.9576
1500/2680 = 56.0%, SPS: 0.1, ELAP: 4:41:46, ETA: 3:41:40 - loss: 19.3734
1510/2680 = 56.3%, SPS: 0.1, ELAP: 4:43:39, ETA: 3:39:47 - loss: 20.6773
1520/2680 = 56.7%, SPS: 0.1, ELAP: 4:45:33, ETA: 3:37:55 - loss: 19.6890
1530/2680 = 57.1%, SPS: 0.1, ELAP: 4:47:26, ETA: 3:36:03 - loss: 19.3396
1540/2680 = 57.5%, SPS: 0.1, ELAP: 4:49:19, ETA: 3:34:10 - loss: 20.3305
1550/2680 = 57.8%, SPS: 0.1, ELAP: 4:51:15, ETA: 3:32:20 - loss: 17.2886
1560/2680 = 58.2%, SPS: 0.1, ELAP: 4:53:07, ETA: 3:30:26 - loss: 19.8133
1570/2680 = 58.6%, SPS: 0.1, ELAP: 4:55:01, ETA: 3:28:35 - loss: 19.9818
1580/2680 = 59.0%, SPS: 0.1, ELAP: 4:56:54, ETA: 3:26:42 - loss: 18.6070
1590/2680 = 59.3%, SPS: 0.1, ELAP: 4:58:47, ETA: 3:24:49 - loss: 17.8926
1600/2680 = 59.7%, SPS: 0.1, ELAP: 5:00:41, ETA: 3:22:57 - loss: 23.1611
1610/2680 = 60.1%, SPS: 0.1, ELAP: 5:02:33, ETA: 3:21:05 - loss: 13.8283
1620/2680 = 60.4%, SPS: 0.1, ELAP: 5:04:26, ETA: 3:19:12 - loss: 18.0660
1630/2680 = 60.8%, SPS: 0.1, ELAP: 5:06:18, ETA: 3:17:19 - loss: 19.8060
1640/2680 = 61.2%, SPS: 0.1, ELAP: 5:08:13, ETA: 3:15:27 - loss: 18.9358
1650/2680 = 61.6%, SPS: 0.1, ELAP: 5:10:05, ETA: 3:13:34 - loss: 18.9945
1660/2680 = 61.9%, SPS: 0.1, ELAP: 5:11:58, ETA: 3:11:41 - loss: 19.4674
1670/2680 = 62.3%, SPS: 0.1, ELAP: 5:13:52, ETA: 3:09:49 - loss: 19.6866
1680/2680 = 62.7%, SPS: 0.1, ELAP: 5:15:46, ETA: 3:07:57 - loss: 22.4680
1690/2680 = 63.1%, SPS: 0.1, ELAP: 5:17:39, ETA: 3:06:05 - loss: 23.8336
1700/2680 = 63.4%, SPS: 0.1, ELAP: 5:19:35, ETA: 3:04:14 - loss: 19.5279
1710/2680 = 63.8%, SPS: 0.1, ELAP: 5:21:28, ETA: 3:02:21 - loss: 23.5912
1720/2680 = 64.2%, SPS: 0.1, ELAP: 5:23:24, ETA: 3:00:30 - loss: 16.3837
1730/2680 = 64.6%, SPS: 0.1, ELAP: 5:25:18, ETA: 2:58:38 - loss: 12.6178
1740/2680 = 64.9%, SPS: 0.1, ELAP: 5:27:11, ETA: 2:56:45 - loss: 17.1669
1750/2680 = 65.3%, SPS: 0.1, ELAP: 5:29:06, ETA: 2:54:54 - loss: 17.1182
1760/2680 = 65.7%, SPS: 0.1, ELAP: 5:31:00, ETA: 2:53:01 - loss: 21.6815
1770/2680 = 66.0%, SPS: 0.1, ELAP: 5:32:55, ETA: 2:51:09 - loss: 19.6418
1780/2680 = 66.4%, SPS: 0.1, ELAP: 5:34:49, ETA: 2:49:18 - loss: 20.9930
1790/2680 = 66.8%, SPS: 0.1, ELAP: 5:36:41, ETA: 2:47:24 - loss: 23.6495
1800/2680 = 67.2%, SPS: 0.1, ELAP: 5:38:35, ETA: 2:45:32 - loss: 21.6909
1810/2680 = 67.5%, SPS: 0.1, ELAP: 5:40:29, ETA: 2:43:39 - loss: 18.8548
1820/2680 = 67.9%, SPS: 0.1, ELAP: 5:42:21, ETA: 2:41:46 - loss: 23.1969
1830/2680 = 68.3%, SPS: 0.1, ELAP: 5:44:13, ETA: 2:39:53 - loss: 21.3797
1840/2680 = 68.7%, SPS: 0.1, ELAP: 5:46:07, ETA: 2:38:01 - loss: 18.6704
1850/2680 = 69.0%, SPS: 0.1, ELAP: 5:47:59, ETA: 2:36:07 - loss: 17.3596
1860/2680 = 69.4%, SPS: 0.1, ELAP: 5:49:52, ETA: 2:34:15 - loss: 20.3633
1870/2680 = 69.8%, SPS: 0.1, ELAP: 5:51:45, ETA: 2:32:22 - loss: 19.4834
1880/2680 = 70.1%, SPS: 0.1, ELAP: 5:53:38, ETA: 2:30:29 - loss: 19.0583
1890/2680 = 70.5%, SPS: 0.1, ELAP: 5:55:33, ETA: 2:28:37 - loss: 23.0242
1900/2680 = 70.9%, SPS: 0.1, ELAP: 5:57:26, ETA: 2:26:44 - loss: 19.3608
1910/2680 = 71.3%, SPS: 0.1, ELAP: 5:59:18, ETA: 2:24:51 - loss: 18.8135
1920/2680 = 71.6%, SPS: 0.1, ELAP: 6:01:11, ETA: 2:22:58 - loss: 20.8506
1930/2680 = 72.0%, SPS: 0.1, ELAP: 6:03:06, ETA: 2:21:06 - loss: 14.4028
1940/2680 = 72.4%, SPS: 0.1, ELAP: 6:04:58, ETA: 2:19:13 - loss: 15.9096
1950/2680 = 72.8%, SPS: 0.1, ELAP: 6:06:53, ETA: 2:17:21 - loss: 19.7736
1960/2680 = 73.1%, SPS: 0.1, ELAP: 6:08:46, ETA: 2:15:28 - loss: 20.1979
1970/2680 = 73.5%, SPS: 0.1, ELAP: 6:10:40, ETA: 2:13:35 - loss: 22.5106
1980/2680 = 73.9%, SPS: 0.1, ELAP: 6:12:34, ETA: 2:11:43 - loss: 20.0141
1990/2680 = 74.3%, SPS: 0.1, ELAP: 6:14:26, ETA: 2:09:50 - loss: 16.8174
2000/2680 = 74.6%, SPS: 0.1, ELAP: 6:16:19, ETA: 2:07:57 - loss: 17.7034
2010/2680 = 75.0%, SPS: 0.1, ELAP: 6:18:12, ETA: 2:06:04 - loss: 21.5445
2020/2680 = 75.4%, SPS: 0.1, ELAP: 6:20:07, ETA: 2:04:12 - loss: 18.2724
2030/2680 = 75.7%, SPS: 0.1, ELAP: 6:22:01, ETA: 2:02:19 - loss: 23.5617
2040/2680 = 76.1%, SPS: 0.1, ELAP: 6:23:56, ETA: 2:00:27 - loss: 20.1296
2050/2680 = 76.5%, SPS: 0.1, ELAP: 6:25:48, ETA: 1:58:34 - loss: 23.6539
2060/2680 = 76.9%, SPS: 0.1, ELAP: 6:27:41, ETA: 1:56:41 - loss: 20.0683
2070/2680 = 77.2%, SPS: 0.1, ELAP: 6:29:33, ETA: 1:54:48 - loss: 16.9090
2080/2680 = 77.6%, SPS: 0.1, ELAP: 6:31:27, ETA: 1:52:55 - loss: 20.2353
2090/2680 = 78.0%, SPS: 0.1, ELAP: 6:33:18, ETA: 1:51:02 - loss: 19.6218
2100/2680 = 78.4%, SPS: 0.1, ELAP: 6:35:13, ETA: 1:49:09 - loss: 19.8757
2110/2680 = 78.7%, SPS: 0.1, ELAP: 6:37:07, ETA: 1:47:17 - loss: 24.7569
2120/2680 = 79.1%, SPS: 0.1, ELAP: 6:38:59, ETA: 1:45:24 - loss: 20.4266
2130/2680 = 79.5%, SPS: 0.1, ELAP: 6:40:52, ETA: 1:43:31 - loss: 21.9894
2140/2680 = 79.9%, SPS: 0.1, ELAP: 6:42:44, ETA: 1:41:38 - loss: 15.5554
2150/2680 = 80.2%, SPS: 0.1, ELAP: 6:44:38, ETA: 1:39:45 - loss: 22.1273
2160/2680 = 80.6%, SPS: 0.1, ELAP: 6:46:32, ETA: 1:37:52 - loss: 21.7389
2170/2680 = 81.0%, SPS: 0.1, ELAP: 6:48:25, ETA: 1:35:59 - loss: 22.1357
2180/2680 = 81.3%, SPS: 0.1, ELAP: 6:50:19, ETA: 1:34:06 - loss: 21.9440
2190/2680 = 81.7%, SPS: 0.1, ELAP: 6:52:10, ETA: 1:32:13 - loss: 16.2746
2200/2680 = 82.1%, SPS: 0.1, ELAP: 6:54:03, ETA: 1:30:20 - loss: 18.4669
2210/2680 = 82.5%, SPS: 0.1, ELAP: 6:55:58, ETA: 1:28:28 - loss: 20.0617
2220/2680 = 82.8%, SPS: 0.1, ELAP: 6:57:51, ETA: 1:26:35 - loss: 20.7693
2230/2680 = 83.2%, SPS: 0.1, ELAP: 6:59:44, ETA: 1:24:42 - loss: 21.4518
2240/2680 = 83.6%, SPS: 0.1, ELAP: 7:01:36, ETA: 1:22:49 - loss: 20.6990
2250/2680 = 84.0%, SPS: 0.1, ELAP: 7:03:31, ETA: 1:20:56 - loss: 20.9194
2260/2680 = 84.3%, SPS: 0.1, ELAP: 7:05:24, ETA: 1:19:03 - loss: 17.8959
2270/2680 = 84.7%, SPS: 0.1, ELAP: 7:07:16, ETA: 1:17:10 - loss: 21.0770
2280/2680 = 85.1%, SPS: 0.1, ELAP: 7:09:09, ETA: 1:15:17 - loss: 19.6453
2290/2680 = 85.4%, SPS: 0.1, ELAP: 7:11:04, ETA: 1:13:25 - loss: 20.9758
2300/2680 = 85.8%, SPS: 0.1, ELAP: 7:12:56, ETA: 1:11:32 - loss: 18.8663
2310/2680 = 86.2%, SPS: 0.1, ELAP: 7:14:48, ETA: 1:09:39 - loss: 21.2130
2320/2680 = 86.6%, SPS: 0.1, ELAP: 7:16:39, ETA: 1:07:45 - loss: 18.8374
2330/2680 = 86.9%, SPS: 0.1, ELAP: 7:18:32, ETA: 1:05:52 - loss: 18.3066
2340/2680 = 87.3%, SPS: 0.1, ELAP: 7:20:26, ETA: 1:04:00 - loss: 21.8576
2350/2680 = 87.7%, SPS: 0.1, ELAP: 7:22:18, ETA: 1:02:07 - loss: 21.9074
2360/2680 = 88.1%, SPS: 0.1, ELAP: 7:24:09, ETA: 1:00:13 - loss: 20.9826
2370/2680 = 88.4%, SPS: 0.1, ELAP: 7:26:03, ETA: 58:21 - loss: 20.2654
2380/2680 = 88.8%, SPS: 0.1, ELAP: 7:27:55, ETA: 56:28 - loss: 22.6121
2390/2680 = 89.2%, SPS: 0.1, ELAP: 7:29:47, ETA: 54:35 - loss: 16.3683
2400/2680 = 89.6%, SPS: 0.1, ELAP: 7:31:41, ETA: 52:42 - loss: 18.6122
2410/2680 = 89.9%, SPS: 0.1, ELAP: 7:33:32, ETA: 50:49 - loss: 15.0298
2420/2680 = 90.3%, SPS: 0.1, ELAP: 7:35:24, ETA: 48:56 - loss: 19.4040
2430/2680 = 90.7%, SPS: 0.1, ELAP: 7:37:16, ETA: 47:03 - loss: 22.4099
2440/2680 = 91.0%, SPS: 0.1, ELAP: 7:39:09, ETA: 45:10 - loss: 21.6186
2450/2680 = 91.4%, SPS: 0.1, ELAP: 7:41:01, ETA: 43:17 - loss: 17.5188
2460/2680 = 91.8%, SPS: 0.1, ELAP: 7:42:56, ETA: 41:24 - loss: 18.0126
2470/2680 = 92.2%, SPS: 0.1, ELAP: 7:44:48, ETA: 39:31 - loss: 16.5385
2480/2680 = 92.5%, SPS: 0.1, ELAP: 7:46:40, ETA: 37:38 - loss: 21.2340
2490/2680 = 92.9%, SPS: 0.1, ELAP: 7:48:32, ETA: 35:45 - loss: 19.5627
2500/2680 = 93.3%, SPS: 0.1, ELAP: 7:50:24, ETA: 33:52 - loss: 20.8666
2510/2680 = 93.7%, SPS: 0.1, ELAP: 7:52:18, ETA: 31:59 - loss: 23.0904
2520/2680 = 94.0%, SPS: 0.1, ELAP: 7:54:10, ETA: 30:06 - loss: 15.0429
2530/2680 = 94.4%, SPS: 0.1, ELAP: 7:56:03, ETA: 28:13 - loss: 16.7089
2540/2680 = 94.8%, SPS: 0.1, ELAP: 7:57:57, ETA: 26:21 - loss: 20.5319
2550/2680 = 95.1%, SPS: 0.1, ELAP: 7:59:51, ETA: 24:28 - loss: 19.7810
2560/2680 = 95.5%, SPS: 0.1, ELAP: 8:01:43, ETA: 22:35 - loss: 17.9934
2570/2680 = 95.9%, SPS: 0.1, ELAP: 8:03:36, ETA: 20:42 - loss: 16.9509
2580/2680 = 96.3%, SPS: 0.1, ELAP: 8:05:29, ETA: 18:49 - loss: 21.8594
2590/2680 = 96.6%, SPS: 0.1, ELAP: 8:07:23, ETA: 16:56 - loss: 16.7791
2600/2680 = 97.0%, SPS: 0.1, ELAP: 8:09:15, ETA: 15:03 - loss: 17.3512
2610/2680 = 97.4%, SPS: 0.1, ELAP: 8:11:07, ETA: 13:10 - loss: 14.2443
2620/2680 = 97.8%, SPS: 0.1, ELAP: 8:12:59, ETA: 11:17 - loss: 18.9862
2630/2680 = 98.1%, SPS: 0.1, ELAP: 8:14:54, ETA: 9:25 - loss: 19.9146
2640/2680 = 98.5%, SPS: 0.1, ELAP: 8:16:46, ETA: 7:32 - loss: 20.1490
2650/2680 = 98.9%, SPS: 0.1, ELAP: 8:18:39, ETA: 5:39 - loss: 22.5202
2660/2680 = 99.3%, SPS: 0.1, ELAP: 8:20:33, ETA: 3:46 - loss: 15.2829
2670/2680 = 99.6%, SPS: 0.1, ELAP: 8:22:26, ETA: 1:53 - loss: 23.1243
2680/2680 = 100.0%, SPS: 0.1, ELAP: 8:24:20, ETA: 0 - loss: 17.6996
2680/2680 = 100.0%, SPS: 0.1, ELAP: 8:24:26, ETA: 0

================================================================================
Run dev set evaluation: model=electra_large, trial 1/1
================================================================================
Evaluating cola
Loading dataset cola_dev
Existing tfrecords not found so creating
Writing example 0 of 1043
Building model...
Building complete
2020-05-22 19:00:48.238901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:02:00.0
2020-05-22 19:00:48.239610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:03:00.0
2020-05-22 19:00:48.240279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 2 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:82:00.0
2020-05-22 19:00:48.240932: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 3 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:83:00.0
2020-05-22 19:00:48.241149: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 19:00:48.241227: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 19:00:48.241295: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 19:00:48.241361: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 19:00:48.241426: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 19:00:48.241492: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 19:00:48.241515: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-05-22 19:00:48.241526: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-05-22 19:00:48.241690: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-22 19:00:48.241703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 1 2 3
2020-05-22 19:00:48.241711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N Y N N
2020-05-22 19:00:48.241718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1:   Y N N N
2020-05-22 19:00:48.241725: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 2:   N N N Y
2020-05-22 19:00:48.241733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 3:   N N Y N
/home/spectrometer/.local/lib/python3.6/site-packages/sklearn/metrics/classification.py:543: RuntimeWarning: invalid value encountered in double_scalars
  mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)
cola: mcc: 0.00 - loss: 0.62

Writing results to /home/spectrometer/models/electra_large/results/cola_results.txt
================================================================================
Running on the test set and writing the predictions: model=electra_large, trial 1/1
================================================================================
Writing out predictions for [Task(cola)] test
Loading dataset cola_test
Existing tfrecords not found so creating
Writing example 0 of 1063
Building model...
Building complete
2020-05-22 19:03:03.173720: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:02:00.0
2020-05-22 19:03:03.174429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:03:00.0
2020-05-22 19:03:03.175120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 2 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:82:00.0
2020-05-22 19:03:03.175819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 3 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:83:00.0
2020-05-22 19:03:03.176023: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 19:03:03.176115: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 19:03:03.176188: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 19:03:03.176261: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 19:03:03.176334: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 19:03:03.176405: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64
2020-05-22 19:03:03.176432: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-05-22 19:03:03.176443: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-05-22 19:03:03.176603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-22 19:03:03.176616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 1 2 3
2020-05-22 19:03:03.176624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N Y N N
2020-05-22 19:03:03.176631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1:   Y N N N
2020-05-22 19:03:03.176638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 2:   N N N Y
2020-05-22 19:03:03.176645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 3:   N N Y N
Pickling predictions for 1063 cola examples (test)
yelantf commented 4 years ago

@spectrometerH , hi, are you finetuning with electra-large model on CoLA dataset? I tried it on a 12-GB GPU, but an OOM error always raises. Could you tell me how did you get this work? Many thanks!