Closed spectrometerH closed 4 years ago
I use
python3 run_finetuning.py --data-dir ~ --model-name electra_large --hparams '{"model_size": "large", "task_names": ["cola"], "num_train_epochs": 10, "do_train": true, "do_eval": true, "write_test_outputs": true}'
to reproduce Electra-Large's result on GLEU. But the loss is always around 20 during fine-tuning.
But when I try Electra-Base using
python3 run_finetuning.py --data-dir ~ --model-name electra_base --hparams '{"model_size": "base", "task_names": ["cola"], "num_train_epochs": 3, "do_train": true, "do_eval": true, "write_test_outputs": true}'
the loss decreased normally as the result is OK.
The pre-train models are all downloaded from released models.
The output is attached below for your interest
================================================================================ Config: model=electra_large, trial 1/1 ================================================================================ answerable_classifier True answerable_uses_start_logits True answerable_weight 0.5 beam_size 20 data_dir /home/spectrometer debug False do_eval True do_lower_case True do_train True doc_stride 128 double_unordered True embedding_size None eval_batch_size 32 gcp_project None init_checkpoint /home/spectrometer/models/electra_large iterations_per_loop 1000 joint_prediction True keep_all_models True layerwise_lr_decay 0.9 learning_rate 5e-05 log_examples False max_answer_length 30 max_query_length 64 max_seq_length 128 model_dir /home/spectrometer/models/electra_large/finetuning_models/cola_model model_hparam_overrides {} model_name electra_large model_size large n_best_size 20 n_writes_test 5 num_tpu_cores 1 num_train_epochs 10 num_trials 1 predict_batch_size 32 preprocessed_data_dir /home/spectrometer/models/electra_large/finetuning_tfrecords/cola_tfrecords qa_eval_file <built-in method format of str object at 0x7f48d7b3f4e0> qa_na_file <built-in method format of str object at 0x7f48d7ba4730> qa_na_threshold -2.75 qa_preds_file <built-in method format of str object at 0x7f48d7b3f558> raw_data_dir <built-in method format of str object at 0x7f48d7b30ce8> results_pkl /home/spectrometer/models/electra_large/results/cola_results.pkl results_txt /home/spectrometer/models/electra_large/results/cola_results.txt save_checkpoints_steps 1000000 task_names ['cola'] test_predictions <built-in method format of str object at 0x7f48d7b431c8> tpu_job_name None tpu_name None tpu_zone None train_batch_size 32 use_tfrecords_if_existing True use_tpu False vocab_file /home/spectrometer/models/electra_large/vocab.txt vocab_size 30522 warmup_proportion 0.1 weight_decay_rate 0.01 write_test_outputs True Loading dataset cola_train ================================================================================ Start training: model=electra_large, trial 1/1 ================================================================================ Training for 2680 steps Building model... Building complete 2020-05-22 10:35:20.577095: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-05-22 10:35:20.598852: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2020-05-22 10:35:22.827953: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2d51cd0 executing computations on platform CUDA. Devices: 2020-05-22 10:35:22.828024: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): TITAN V, Compute Capability 7.0 2020-05-22 10:35:22.828041: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (1): TITAN V, Compute Capability 7.0 2020-05-22 10:35:22.828061: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (2): TITAN V, Compute Capability 7.0 2020-05-22 10:35:22.828074: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (3): TITAN V, Compute Capability 7.0 2020-05-22 10:35:22.840141: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2099980000 Hz 2020-05-22 10:35:22.844202: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2d52030 executing computations on platform Host. Devices: 2020-05-22 10:35:22.844245: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> 2020-05-22 10:35:22.850947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455 pciBusID: 0000:02:00.0 2020-05-22 10:35:22.852163: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties: name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455 pciBusID: 0000:03:00.0 2020-05-22 10:35:22.858713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 2 with properties: name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455 pciBusID: 0000:82:00.0 2020-05-22 10:35:22.859862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 3 with properties: name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455 pciBusID: 0000:83:00.0 2020-05-22 10:35:22.860124: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 10:35:22.860266: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 10:35:22.860394: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 10:35:22.860517: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 10:35:22.860638: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 10:35:22.860758: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 10:35:22.868326: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2020-05-22 10:35:22.868366: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices... 2020-05-22 10:35:22.868605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-22 10:35:22.868628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 1 2 3 2020-05-22 10:35:22.868642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N Y N N 2020-05-22 10:35:22.868654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1: Y N N N 2020-05-22 10:35:22.868666: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 2: N N N Y 2020-05-22 10:35:22.868677: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 3: N N Y N 2020-05-22 10:35:29.430128: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile. 10/2680 = 0.4%, SPS: 0.1, ELAP: 2:28, ETA: 10:58:13 - loss: 22.3920 20/2680 = 0.7%, SPS: 0.1, ELAP: 4:29, ETA: 9:55:29 - loss: 11.8185 30/2680 = 1.1%, SPS: 0.1, ELAP: 6:23, ETA: 9:23:14 - loss: 17.4742 40/2680 = 1.5%, SPS: 0.1, ELAP: 8:20, ETA: 9:10:15 - loss: 18.5379 50/2680 = 1.9%, SPS: 0.1, ELAP: 10:14, ETA: 8:58:32 - loss: 21.8179 60/2680 = 2.2%, SPS: 0.1, ELAP: 12:07, ETA: 8:49:13 - loss: 17.5961 70/2680 = 2.6%, SPS: 0.1, ELAP: 14:02, ETA: 8:43:21 - loss: 28.3880 80/2680 = 3.0%, SPS: 0.1, ELAP: 16:00, ETA: 8:39:55 - loss: 24.6309 90/2680 = 3.4%, SPS: 0.1, ELAP: 17:52, ETA: 8:34:16 - loss: 25.7210 100/2680 = 3.7%, SPS: 0.1, ELAP: 19:46, ETA: 8:30:07 - loss: 16.6593 110/2680 = 4.1%, SPS: 0.1, ELAP: 21:41, ETA: 8:26:32 - loss: 20.7071 120/2680 = 4.5%, SPS: 0.1, ELAP: 23:38, ETA: 8:24:17 - loss: 18.8301 130/2680 = 4.9%, SPS: 0.1, ELAP: 25:33, ETA: 8:21:19 - loss: 18.7853 140/2680 = 5.2%, SPS: 0.1, ELAP: 27:28, ETA: 8:18:13 - loss: 19.0647 150/2680 = 5.6%, SPS: 0.1, ELAP: 29:20, ETA: 8:14:44 - loss: 20.5842 160/2680 = 6.0%, SPS: 0.1, ELAP: 31:14, ETA: 8:12:00 - loss: 19.2305 170/2680 = 6.3%, SPS: 0.1, ELAP: 33:09, ETA: 8:09:31 - loss: 21.2707 180/2680 = 6.7%, SPS: 0.1, ELAP: 35:00, ETA: 8:06:03 - loss: 21.4293 190/2680 = 7.1%, SPS: 0.1, ELAP: 36:54, ETA: 8:03:30 - loss: 22.0171 200/2680 = 7.5%, SPS: 0.1, ELAP: 38:49, ETA: 8:01:23 - loss: 18.9573 210/2680 = 7.8%, SPS: 0.1, ELAP: 40:42, ETA: 7:58:48 - loss: 25.9965 220/2680 = 8.2%, SPS: 0.1, ELAP: 42:38, ETA: 7:56:41 - loss: 19.1371 230/2680 = 8.6%, SPS: 0.1, ELAP: 44:33, ETA: 7:54:29 - loss: 19.1548 240/2680 = 9.0%, SPS: 0.1, ELAP: 46:27, ETA: 7:52:17 - loss: 20.4866 250/2680 = 9.3%, SPS: 0.1, ELAP: 48:17, ETA: 7:49:23 - loss: 15.1779 260/2680 = 9.7%, SPS: 0.1, ELAP: 50:11, ETA: 7:47:10 - loss: 22.7514 270/2680 = 10.1%, SPS: 0.1, ELAP: 52:02, ETA: 7:44:26 - loss: 15.4121 280/2680 = 10.4%, SPS: 0.1, ELAP: 53:54, ETA: 7:41:59 - loss: 22.1511 290/2680 = 10.8%, SPS: 0.1, ELAP: 55:46, ETA: 7:39:38 - loss: 22.4470 300/2680 = 11.2%, SPS: 0.1, ELAP: 57:39, ETA: 7:37:22 - loss: 18.6222 310/2680 = 11.6%, SPS: 0.1, ELAP: 59:30, ETA: 7:34:57 - loss: 21.4283 320/2680 = 11.9%, SPS: 0.1, ELAP: 1:01:25, ETA: 7:32:54 - loss: 20.4210 330/2680 = 12.3%, SPS: 0.1, ELAP: 1:03:18, ETA: 7:30:46 - loss: 20.3211 340/2680 = 12.7%, SPS: 0.1, ELAP: 1:05:09, ETA: 7:28:25 - loss: 18.8367 350/2680 = 13.1%, SPS: 0.1, ELAP: 1:07:01, ETA: 7:26:07 - loss: 18.8168 360/2680 = 13.4%, SPS: 0.1, ELAP: 1:08:51, ETA: 7:23:43 - loss: 18.3760 370/2680 = 13.8%, SPS: 0.1, ELAP: 1:10:43, ETA: 7:21:29 - loss: 14.4103 380/2680 = 14.2%, SPS: 0.1, ELAP: 1:12:36, ETA: 7:19:26 - loss: 16.8776 390/2680 = 14.6%, SPS: 0.1, ELAP: 1:14:27, ETA: 7:17:08 - loss: 19.5334 400/2680 = 14.9%, SPS: 0.1, ELAP: 1:16:19, ETA: 7:14:58 - loss: 16.8220 410/2680 = 15.3%, SPS: 0.1, ELAP: 1:18:09, ETA: 7:12:39 - loss: 20.9394 420/2680 = 15.7%, SPS: 0.1, ELAP: 1:20:00, ETA: 7:10:29 - loss: 20.4125 430/2680 = 16.0%, SPS: 0.1, ELAP: 1:21:52, ETA: 7:08:21 - loss: 24.1731 440/2680 = 16.4%, SPS: 0.1, ELAP: 1:23:42, ETA: 7:06:09 - loss: 16.3555 450/2680 = 16.8%, SPS: 0.1, ELAP: 1:25:34, ETA: 7:04:04 - loss: 17.3892 460/2680 = 17.2%, SPS: 0.1, ELAP: 1:27:27, ETA: 7:02:01 - loss: 23.5229 470/2680 = 17.5%, SPS: 0.1, ELAP: 1:29:19, ETA: 6:59:57 - loss: 13.1339 480/2680 = 17.9%, SPS: 0.1, ELAP: 1:31:11, ETA: 6:57:54 - loss: 20.4540 490/2680 = 18.3%, SPS: 0.1, ELAP: 1:33:04, ETA: 6:55:57 - loss: 20.0320 500/2680 = 18.7%, SPS: 0.1, ELAP: 1:34:56, ETA: 6:53:54 - loss: 19.3428 510/2680 = 19.0%, SPS: 0.1, ELAP: 1:36:47, ETA: 6:51:50 - loss: 19.0736 520/2680 = 19.4%, SPS: 0.1, ELAP: 1:38:38, ETA: 6:49:44 - loss: 24.2149 530/2680 = 19.8%, SPS: 0.1, ELAP: 1:40:30, ETA: 6:47:42 - loss: 20.3069 540/2680 = 20.1%, SPS: 0.1, ELAP: 1:42:24, ETA: 6:45:47 - loss: 16.3973 550/2680 = 20.5%, SPS: 0.1, ELAP: 1:44:15, ETA: 6:43:45 - loss: 22.4947 560/2680 = 20.9%, SPS: 0.1, ELAP: 1:46:06, ETA: 6:41:41 - loss: 21.7756 570/2680 = 21.3%, SPS: 0.1, ELAP: 1:47:58, ETA: 6:39:39 - loss: 18.6298 580/2680 = 21.6%, SPS: 0.1, ELAP: 1:49:49, ETA: 6:37:38 - loss: 20.8251 590/2680 = 22.0%, SPS: 0.1, ELAP: 1:51:41, ETA: 6:35:36 - loss: 18.4585 600/2680 = 22.4%, SPS: 0.1, ELAP: 1:53:33, ETA: 6:33:37 - loss: 20.3935 610/2680 = 22.8%, SPS: 0.1, ELAP: 1:55:24, ETA: 6:31:35 - loss: 23.2414 620/2680 = 23.1%, SPS: 0.1, ELAP: 1:57:15, ETA: 6:29:34 - loss: 20.5670 630/2680 = 23.5%, SPS: 0.1, ELAP: 1:59:06, ETA: 6:27:33 - loss: 21.1991 640/2680 = 23.9%, SPS: 0.1, ELAP: 2:00:57, ETA: 6:25:31 - loss: 19.3440 650/2680 = 24.3%, SPS: 0.1, ELAP: 2:02:48, ETA: 6:23:31 - loss: 19.9161 660/2680 = 24.6%, SPS: 0.1, ELAP: 2:04:38, ETA: 6:21:27 - loss: 17.1423 670/2680 = 25.0%, SPS: 0.1, ELAP: 2:06:28, ETA: 6:19:25 - loss: 20.1901 680/2680 = 25.4%, SPS: 0.1, ELAP: 2:08:19, ETA: 6:17:24 - loss: 19.0718 690/2680 = 25.7%, SPS: 0.1, ELAP: 2:10:10, ETA: 6:15:24 - loss: 22.3521 700/2680 = 26.1%, SPS: 0.1, ELAP: 2:12:01, ETA: 6:13:24 - loss: 17.8936 710/2680 = 26.5%, SPS: 0.1, ELAP: 2:13:52, ETA: 6:11:27 - loss: 19.6297 720/2680 = 26.9%, SPS: 0.1, ELAP: 2:15:42, ETA: 6:09:26 - loss: 19.9130 730/2680 = 27.2%, SPS: 0.1, ELAP: 2:17:33, ETA: 6:07:26 - loss: 20.9930 740/2680 = 27.6%, SPS: 0.1, ELAP: 2:19:23, ETA: 6:05:24 - loss: 20.1684 750/2680 = 28.0%, SPS: 0.1, ELAP: 2:21:15, ETA: 6:03:28 - loss: 19.4356 760/2680 = 28.4%, SPS: 0.1, ELAP: 2:23:05, ETA: 6:01:30 - loss: 20.4887 770/2680 = 28.7%, SPS: 0.1, ELAP: 2:24:57, ETA: 5:59:33 - loss: 24.3556 780/2680 = 29.1%, SPS: 0.1, ELAP: 2:26:47, ETA: 5:57:33 - loss: 19.5041 790/2680 = 29.5%, SPS: 0.1, ELAP: 2:28:38, ETA: 5:55:36 - loss: 23.0490 800/2680 = 29.9%, SPS: 0.1, ELAP: 2:30:28, ETA: 5:53:36 - loss: 18.2448 810/2680 = 30.2%, SPS: 0.1, ELAP: 2:32:20, ETA: 5:51:40 - loss: 17.5824 820/2680 = 30.6%, SPS: 0.1, ELAP: 2:34:10, ETA: 5:49:43 - loss: 20.4027 830/2680 = 31.0%, SPS: 0.1, ELAP: 2:36:01, ETA: 5:47:46 - loss: 20.9864 840/2680 = 31.3%, SPS: 0.1, ELAP: 2:37:52, ETA: 5:45:49 - loss: 18.1183 850/2680 = 31.7%, SPS: 0.1, ELAP: 2:39:43, ETA: 5:43:52 - loss: 19.8767 860/2680 = 32.1%, SPS: 0.1, ELAP: 2:41:35, ETA: 5:41:57 - loss: 19.0511 870/2680 = 32.5%, SPS: 0.1, ELAP: 2:43:25, ETA: 5:40:00 - loss: 21.5915 880/2680 = 32.8%, SPS: 0.1, ELAP: 2:45:18, ETA: 5:38:06 - loss: 24.4359 890/2680 = 33.2%, SPS: 0.1, ELAP: 2:47:09, ETA: 5:36:10 - loss: 19.6625 900/2680 = 33.6%, SPS: 0.1, ELAP: 2:48:59, ETA: 5:34:13 - loss: 19.2670 910/2680 = 34.0%, SPS: 0.1, ELAP: 2:50:50, ETA: 5:32:16 - loss: 20.3501 920/2680 = 34.3%, SPS: 0.1, ELAP: 2:52:41, ETA: 5:30:20 - loss: 15.8821 930/2680 = 34.7%, SPS: 0.1, ELAP: 2:54:32, ETA: 5:28:26 - loss: 22.1362 940/2680 = 35.1%, SPS: 0.1, ELAP: 2:56:24, ETA: 5:26:31 - loss: 20.8700 950/2680 = 35.4%, SPS: 0.1, ELAP: 2:58:15, ETA: 5:24:36 - loss: 18.1807 960/2680 = 35.8%, SPS: 0.1, ELAP: 3:00:07, ETA: 5:22:43 - loss: 18.3234 970/2680 = 36.2%, SPS: 0.1, ELAP: 3:01:58, ETA: 5:20:48 - loss: 19.3116 980/2680 = 36.6%, SPS: 0.1, ELAP: 3:03:49, ETA: 5:18:52 - loss: 19.5240 990/2680 = 36.9%, SPS: 0.1, ELAP: 3:05:40, ETA: 5:16:57 - loss: 18.6035 1000/2680 = 37.3%, SPS: 0.1, ELAP: 3:07:31, ETA: 5:15:01 - loss: 18.5933 1010/2680 = 37.7%, SPS: 0.1, ELAP: 3:09:22, ETA: 5:13:06 - loss: 20.5561 1020/2680 = 38.1%, SPS: 0.1, ELAP: 3:11:13, ETA: 5:11:12 - loss: 18.2277 1030/2680 = 38.4%, SPS: 0.1, ELAP: 3:13:05, ETA: 5:09:19 - loss: 19.4487 1040/2680 = 38.8%, SPS: 0.1, ELAP: 3:14:58, ETA: 5:07:26 - loss: 18.1976 1050/2680 = 39.2%, SPS: 0.1, ELAP: 3:16:50, ETA: 5:05:34 - loss: 22.8400 1060/2680 = 39.6%, SPS: 0.1, ELAP: 3:18:42, ETA: 5:03:40 - loss: 20.7841 1070/2680 = 39.9%, SPS: 0.1, ELAP: 3:20:35, ETA: 5:01:48 - loss: 15.9882 1080/2680 = 40.3%, SPS: 0.1, ELAP: 3:22:28, ETA: 4:59:58 - loss: 21.3980 1090/2680 = 40.7%, SPS: 0.1, ELAP: 3:24:20, ETA: 4:58:04 - loss: 20.7809 1100/2680 = 41.0%, SPS: 0.1, ELAP: 3:26:14, ETA: 4:56:13 - loss: 18.5002 1110/2680 = 41.4%, SPS: 0.1, ELAP: 3:28:06, ETA: 4:54:20 - loss: 21.2391 1120/2680 = 41.8%, SPS: 0.1, ELAP: 3:29:59, ETA: 4:52:29 - loss: 19.3268 1130/2680 = 42.2%, SPS: 0.1, ELAP: 3:31:51, ETA: 4:50:36 - loss: 21.1105 1140/2680 = 42.5%, SPS: 0.1, ELAP: 3:33:44, ETA: 4:48:43 - loss: 16.7126 1150/2680 = 42.9%, SPS: 0.1, ELAP: 3:35:38, ETA: 4:46:53 - loss: 23.4943 1160/2680 = 43.3%, SPS: 0.1, ELAP: 3:37:32, ETA: 4:45:03 - loss: 21.6205 1170/2680 = 43.7%, SPS: 0.1, ELAP: 3:39:24, ETA: 4:43:09 - loss: 18.8343 1180/2680 = 44.0%, SPS: 0.1, ELAP: 3:41:16, ETA: 4:41:16 - loss: 16.1223 1190/2680 = 44.4%, SPS: 0.1, ELAP: 3:43:13, ETA: 4:39:29 - loss: 19.9336 1200/2680 = 44.8%, SPS: 0.1, ELAP: 3:45:05, ETA: 4:37:36 - loss: 21.1273 1210/2680 = 45.1%, SPS: 0.1, ELAP: 3:46:59, ETA: 4:35:45 - loss: 18.0167 1220/2680 = 45.5%, SPS: 0.1, ELAP: 3:48:52, ETA: 4:33:53 - loss: 18.4363 1230/2680 = 45.9%, SPS: 0.1, ELAP: 3:50:45, ETA: 4:32:02 - loss: 17.5395 1240/2680 = 46.3%, SPS: 0.1, ELAP: 3:52:38, ETA: 4:30:10 - loss: 17.4508 1250/2680 = 46.6%, SPS: 0.1, ELAP: 3:54:31, ETA: 4:28:17 - loss: 19.2630 1260/2680 = 47.0%, SPS: 0.1, ELAP: 3:56:24, ETA: 4:26:25 - loss: 20.7043 1270/2680 = 47.4%, SPS: 0.1, ELAP: 3:58:16, ETA: 4:24:32 - loss: 15.8684 1280/2680 = 47.8%, SPS: 0.1, ELAP: 4:00:08, ETA: 4:22:39 - loss: 19.5736 1290/2680 = 48.1%, SPS: 0.1, ELAP: 4:02:03, ETA: 4:20:49 - loss: 20.6141 1300/2680 = 48.5%, SPS: 0.1, ELAP: 4:03:56, ETA: 4:18:57 - loss: 22.3914 1310/2680 = 48.9%, SPS: 0.1, ELAP: 4:05:47, ETA: 4:17:03 - loss: 18.5007 1320/2680 = 49.3%, SPS: 0.1, ELAP: 4:07:40, ETA: 4:15:10 - loss: 15.5616 1330/2680 = 49.6%, SPS: 0.1, ELAP: 4:09:34, ETA: 4:13:19 - loss: 19.3617 1340/2680 = 50.0%, SPS: 0.1, ELAP: 4:11:26, ETA: 4:11:26 - loss: 17.7249 1350/2680 = 50.4%, SPS: 0.1, ELAP: 4:13:20, ETA: 4:09:35 - loss: 22.9627 1360/2680 = 50.7%, SPS: 0.1, ELAP: 4:15:12, ETA: 4:07:42 - loss: 18.1267 1370/2680 = 51.1%, SPS: 0.1, ELAP: 4:17:06, ETA: 4:05:51 - loss: 18.5122 1380/2680 = 51.5%, SPS: 0.1, ELAP: 4:18:59, ETA: 4:03:58 - loss: 20.9085 1390/2680 = 51.9%, SPS: 0.1, ELAP: 4:20:53, ETA: 4:02:07 - loss: 20.7251 1400/2680 = 52.2%, SPS: 0.1, ELAP: 4:22:45, ETA: 4:00:13 - loss: 19.4154 1410/2680 = 52.6%, SPS: 0.1, ELAP: 4:24:38, ETA: 3:58:21 - loss: 22.8371 1420/2680 = 53.0%, SPS: 0.1, ELAP: 4:26:32, ETA: 3:56:30 - loss: 21.9000 1430/2680 = 53.4%, SPS: 0.1, ELAP: 4:28:26, ETA: 3:54:38 - loss: 18.2020 1440/2680 = 53.7%, SPS: 0.1, ELAP: 4:30:19, ETA: 3:52:47 - loss: 18.8780 1450/2680 = 54.1%, SPS: 0.1, ELAP: 4:32:14, ETA: 3:50:56 - loss: 18.1634 1460/2680 = 54.5%, SPS: 0.1, ELAP: 4:34:09, ETA: 3:49:05 - loss: 18.1409 1470/2680 = 54.9%, SPS: 0.1, ELAP: 4:36:03, ETA: 3:47:13 - loss: 16.7001 1480/2680 = 55.2%, SPS: 0.1, ELAP: 4:37:58, ETA: 3:45:23 - loss: 19.2709 1490/2680 = 55.6%, SPS: 0.1, ELAP: 4:39:51, ETA: 3:43:30 - loss: 21.9576 1500/2680 = 56.0%, SPS: 0.1, ELAP: 4:41:46, ETA: 3:41:40 - loss: 19.3734 1510/2680 = 56.3%, SPS: 0.1, ELAP: 4:43:39, ETA: 3:39:47 - loss: 20.6773 1520/2680 = 56.7%, SPS: 0.1, ELAP: 4:45:33, ETA: 3:37:55 - loss: 19.6890 1530/2680 = 57.1%, SPS: 0.1, ELAP: 4:47:26, ETA: 3:36:03 - loss: 19.3396 1540/2680 = 57.5%, SPS: 0.1, ELAP: 4:49:19, ETA: 3:34:10 - loss: 20.3305 1550/2680 = 57.8%, SPS: 0.1, ELAP: 4:51:15, ETA: 3:32:20 - loss: 17.2886 1560/2680 = 58.2%, SPS: 0.1, ELAP: 4:53:07, ETA: 3:30:26 - loss: 19.8133 1570/2680 = 58.6%, SPS: 0.1, ELAP: 4:55:01, ETA: 3:28:35 - loss: 19.9818 1580/2680 = 59.0%, SPS: 0.1, ELAP: 4:56:54, ETA: 3:26:42 - loss: 18.6070 1590/2680 = 59.3%, SPS: 0.1, ELAP: 4:58:47, ETA: 3:24:49 - loss: 17.8926 1600/2680 = 59.7%, SPS: 0.1, ELAP: 5:00:41, ETA: 3:22:57 - loss: 23.1611 1610/2680 = 60.1%, SPS: 0.1, ELAP: 5:02:33, ETA: 3:21:05 - loss: 13.8283 1620/2680 = 60.4%, SPS: 0.1, ELAP: 5:04:26, ETA: 3:19:12 - loss: 18.0660 1630/2680 = 60.8%, SPS: 0.1, ELAP: 5:06:18, ETA: 3:17:19 - loss: 19.8060 1640/2680 = 61.2%, SPS: 0.1, ELAP: 5:08:13, ETA: 3:15:27 - loss: 18.9358 1650/2680 = 61.6%, SPS: 0.1, ELAP: 5:10:05, ETA: 3:13:34 - loss: 18.9945 1660/2680 = 61.9%, SPS: 0.1, ELAP: 5:11:58, ETA: 3:11:41 - loss: 19.4674 1670/2680 = 62.3%, SPS: 0.1, ELAP: 5:13:52, ETA: 3:09:49 - loss: 19.6866 1680/2680 = 62.7%, SPS: 0.1, ELAP: 5:15:46, ETA: 3:07:57 - loss: 22.4680 1690/2680 = 63.1%, SPS: 0.1, ELAP: 5:17:39, ETA: 3:06:05 - loss: 23.8336 1700/2680 = 63.4%, SPS: 0.1, ELAP: 5:19:35, ETA: 3:04:14 - loss: 19.5279 1710/2680 = 63.8%, SPS: 0.1, ELAP: 5:21:28, ETA: 3:02:21 - loss: 23.5912 1720/2680 = 64.2%, SPS: 0.1, ELAP: 5:23:24, ETA: 3:00:30 - loss: 16.3837 1730/2680 = 64.6%, SPS: 0.1, ELAP: 5:25:18, ETA: 2:58:38 - loss: 12.6178 1740/2680 = 64.9%, SPS: 0.1, ELAP: 5:27:11, ETA: 2:56:45 - loss: 17.1669 1750/2680 = 65.3%, SPS: 0.1, ELAP: 5:29:06, ETA: 2:54:54 - loss: 17.1182 1760/2680 = 65.7%, SPS: 0.1, ELAP: 5:31:00, ETA: 2:53:01 - loss: 21.6815 1770/2680 = 66.0%, SPS: 0.1, ELAP: 5:32:55, ETA: 2:51:09 - loss: 19.6418 1780/2680 = 66.4%, SPS: 0.1, ELAP: 5:34:49, ETA: 2:49:18 - loss: 20.9930 1790/2680 = 66.8%, SPS: 0.1, ELAP: 5:36:41, ETA: 2:47:24 - loss: 23.6495 1800/2680 = 67.2%, SPS: 0.1, ELAP: 5:38:35, ETA: 2:45:32 - loss: 21.6909 1810/2680 = 67.5%, SPS: 0.1, ELAP: 5:40:29, ETA: 2:43:39 - loss: 18.8548 1820/2680 = 67.9%, SPS: 0.1, ELAP: 5:42:21, ETA: 2:41:46 - loss: 23.1969 1830/2680 = 68.3%, SPS: 0.1, ELAP: 5:44:13, ETA: 2:39:53 - loss: 21.3797 1840/2680 = 68.7%, SPS: 0.1, ELAP: 5:46:07, ETA: 2:38:01 - loss: 18.6704 1850/2680 = 69.0%, SPS: 0.1, ELAP: 5:47:59, ETA: 2:36:07 - loss: 17.3596 1860/2680 = 69.4%, SPS: 0.1, ELAP: 5:49:52, ETA: 2:34:15 - loss: 20.3633 1870/2680 = 69.8%, SPS: 0.1, ELAP: 5:51:45, ETA: 2:32:22 - loss: 19.4834 1880/2680 = 70.1%, SPS: 0.1, ELAP: 5:53:38, ETA: 2:30:29 - loss: 19.0583 1890/2680 = 70.5%, SPS: 0.1, ELAP: 5:55:33, ETA: 2:28:37 - loss: 23.0242 1900/2680 = 70.9%, SPS: 0.1, ELAP: 5:57:26, ETA: 2:26:44 - loss: 19.3608 1910/2680 = 71.3%, SPS: 0.1, ELAP: 5:59:18, ETA: 2:24:51 - loss: 18.8135 1920/2680 = 71.6%, SPS: 0.1, ELAP: 6:01:11, ETA: 2:22:58 - loss: 20.8506 1930/2680 = 72.0%, SPS: 0.1, ELAP: 6:03:06, ETA: 2:21:06 - loss: 14.4028 1940/2680 = 72.4%, SPS: 0.1, ELAP: 6:04:58, ETA: 2:19:13 - loss: 15.9096 1950/2680 = 72.8%, SPS: 0.1, ELAP: 6:06:53, ETA: 2:17:21 - loss: 19.7736 1960/2680 = 73.1%, SPS: 0.1, ELAP: 6:08:46, ETA: 2:15:28 - loss: 20.1979 1970/2680 = 73.5%, SPS: 0.1, ELAP: 6:10:40, ETA: 2:13:35 - loss: 22.5106 1980/2680 = 73.9%, SPS: 0.1, ELAP: 6:12:34, ETA: 2:11:43 - loss: 20.0141 1990/2680 = 74.3%, SPS: 0.1, ELAP: 6:14:26, ETA: 2:09:50 - loss: 16.8174 2000/2680 = 74.6%, SPS: 0.1, ELAP: 6:16:19, ETA: 2:07:57 - loss: 17.7034 2010/2680 = 75.0%, SPS: 0.1, ELAP: 6:18:12, ETA: 2:06:04 - loss: 21.5445 2020/2680 = 75.4%, SPS: 0.1, ELAP: 6:20:07, ETA: 2:04:12 - loss: 18.2724 2030/2680 = 75.7%, SPS: 0.1, ELAP: 6:22:01, ETA: 2:02:19 - loss: 23.5617 2040/2680 = 76.1%, SPS: 0.1, ELAP: 6:23:56, ETA: 2:00:27 - loss: 20.1296 2050/2680 = 76.5%, SPS: 0.1, ELAP: 6:25:48, ETA: 1:58:34 - loss: 23.6539 2060/2680 = 76.9%, SPS: 0.1, ELAP: 6:27:41, ETA: 1:56:41 - loss: 20.0683 2070/2680 = 77.2%, SPS: 0.1, ELAP: 6:29:33, ETA: 1:54:48 - loss: 16.9090 2080/2680 = 77.6%, SPS: 0.1, ELAP: 6:31:27, ETA: 1:52:55 - loss: 20.2353 2090/2680 = 78.0%, SPS: 0.1, ELAP: 6:33:18, ETA: 1:51:02 - loss: 19.6218 2100/2680 = 78.4%, SPS: 0.1, ELAP: 6:35:13, ETA: 1:49:09 - loss: 19.8757 2110/2680 = 78.7%, SPS: 0.1, ELAP: 6:37:07, ETA: 1:47:17 - loss: 24.7569 2120/2680 = 79.1%, SPS: 0.1, ELAP: 6:38:59, ETA: 1:45:24 - loss: 20.4266 2130/2680 = 79.5%, SPS: 0.1, ELAP: 6:40:52, ETA: 1:43:31 - loss: 21.9894 2140/2680 = 79.9%, SPS: 0.1, ELAP: 6:42:44, ETA: 1:41:38 - loss: 15.5554 2150/2680 = 80.2%, SPS: 0.1, ELAP: 6:44:38, ETA: 1:39:45 - loss: 22.1273 2160/2680 = 80.6%, SPS: 0.1, ELAP: 6:46:32, ETA: 1:37:52 - loss: 21.7389 2170/2680 = 81.0%, SPS: 0.1, ELAP: 6:48:25, ETA: 1:35:59 - loss: 22.1357 2180/2680 = 81.3%, SPS: 0.1, ELAP: 6:50:19, ETA: 1:34:06 - loss: 21.9440 2190/2680 = 81.7%, SPS: 0.1, ELAP: 6:52:10, ETA: 1:32:13 - loss: 16.2746 2200/2680 = 82.1%, SPS: 0.1, ELAP: 6:54:03, ETA: 1:30:20 - loss: 18.4669 2210/2680 = 82.5%, SPS: 0.1, ELAP: 6:55:58, ETA: 1:28:28 - loss: 20.0617 2220/2680 = 82.8%, SPS: 0.1, ELAP: 6:57:51, ETA: 1:26:35 - loss: 20.7693 2230/2680 = 83.2%, SPS: 0.1, ELAP: 6:59:44, ETA: 1:24:42 - loss: 21.4518 2240/2680 = 83.6%, SPS: 0.1, ELAP: 7:01:36, ETA: 1:22:49 - loss: 20.6990 2250/2680 = 84.0%, SPS: 0.1, ELAP: 7:03:31, ETA: 1:20:56 - loss: 20.9194 2260/2680 = 84.3%, SPS: 0.1, ELAP: 7:05:24, ETA: 1:19:03 - loss: 17.8959 2270/2680 = 84.7%, SPS: 0.1, ELAP: 7:07:16, ETA: 1:17:10 - loss: 21.0770 2280/2680 = 85.1%, SPS: 0.1, ELAP: 7:09:09, ETA: 1:15:17 - loss: 19.6453 2290/2680 = 85.4%, SPS: 0.1, ELAP: 7:11:04, ETA: 1:13:25 - loss: 20.9758 2300/2680 = 85.8%, SPS: 0.1, ELAP: 7:12:56, ETA: 1:11:32 - loss: 18.8663 2310/2680 = 86.2%, SPS: 0.1, ELAP: 7:14:48, ETA: 1:09:39 - loss: 21.2130 2320/2680 = 86.6%, SPS: 0.1, ELAP: 7:16:39, ETA: 1:07:45 - loss: 18.8374 2330/2680 = 86.9%, SPS: 0.1, ELAP: 7:18:32, ETA: 1:05:52 - loss: 18.3066 2340/2680 = 87.3%, SPS: 0.1, ELAP: 7:20:26, ETA: 1:04:00 - loss: 21.8576 2350/2680 = 87.7%, SPS: 0.1, ELAP: 7:22:18, ETA: 1:02:07 - loss: 21.9074 2360/2680 = 88.1%, SPS: 0.1, ELAP: 7:24:09, ETA: 1:00:13 - loss: 20.9826 2370/2680 = 88.4%, SPS: 0.1, ELAP: 7:26:03, ETA: 58:21 - loss: 20.2654 2380/2680 = 88.8%, SPS: 0.1, ELAP: 7:27:55, ETA: 56:28 - loss: 22.6121 2390/2680 = 89.2%, SPS: 0.1, ELAP: 7:29:47, ETA: 54:35 - loss: 16.3683 2400/2680 = 89.6%, SPS: 0.1, ELAP: 7:31:41, ETA: 52:42 - loss: 18.6122 2410/2680 = 89.9%, SPS: 0.1, ELAP: 7:33:32, ETA: 50:49 - loss: 15.0298 2420/2680 = 90.3%, SPS: 0.1, ELAP: 7:35:24, ETA: 48:56 - loss: 19.4040 2430/2680 = 90.7%, SPS: 0.1, ELAP: 7:37:16, ETA: 47:03 - loss: 22.4099 2440/2680 = 91.0%, SPS: 0.1, ELAP: 7:39:09, ETA: 45:10 - loss: 21.6186 2450/2680 = 91.4%, SPS: 0.1, ELAP: 7:41:01, ETA: 43:17 - loss: 17.5188 2460/2680 = 91.8%, SPS: 0.1, ELAP: 7:42:56, ETA: 41:24 - loss: 18.0126 2470/2680 = 92.2%, SPS: 0.1, ELAP: 7:44:48, ETA: 39:31 - loss: 16.5385 2480/2680 = 92.5%, SPS: 0.1, ELAP: 7:46:40, ETA: 37:38 - loss: 21.2340 2490/2680 = 92.9%, SPS: 0.1, ELAP: 7:48:32, ETA: 35:45 - loss: 19.5627 2500/2680 = 93.3%, SPS: 0.1, ELAP: 7:50:24, ETA: 33:52 - loss: 20.8666 2510/2680 = 93.7%, SPS: 0.1, ELAP: 7:52:18, ETA: 31:59 - loss: 23.0904 2520/2680 = 94.0%, SPS: 0.1, ELAP: 7:54:10, ETA: 30:06 - loss: 15.0429 2530/2680 = 94.4%, SPS: 0.1, ELAP: 7:56:03, ETA: 28:13 - loss: 16.7089 2540/2680 = 94.8%, SPS: 0.1, ELAP: 7:57:57, ETA: 26:21 - loss: 20.5319 2550/2680 = 95.1%, SPS: 0.1, ELAP: 7:59:51, ETA: 24:28 - loss: 19.7810 2560/2680 = 95.5%, SPS: 0.1, ELAP: 8:01:43, ETA: 22:35 - loss: 17.9934 2570/2680 = 95.9%, SPS: 0.1, ELAP: 8:03:36, ETA: 20:42 - loss: 16.9509 2580/2680 = 96.3%, SPS: 0.1, ELAP: 8:05:29, ETA: 18:49 - loss: 21.8594 2590/2680 = 96.6%, SPS: 0.1, ELAP: 8:07:23, ETA: 16:56 - loss: 16.7791 2600/2680 = 97.0%, SPS: 0.1, ELAP: 8:09:15, ETA: 15:03 - loss: 17.3512 2610/2680 = 97.4%, SPS: 0.1, ELAP: 8:11:07, ETA: 13:10 - loss: 14.2443 2620/2680 = 97.8%, SPS: 0.1, ELAP: 8:12:59, ETA: 11:17 - loss: 18.9862 2630/2680 = 98.1%, SPS: 0.1, ELAP: 8:14:54, ETA: 9:25 - loss: 19.9146 2640/2680 = 98.5%, SPS: 0.1, ELAP: 8:16:46, ETA: 7:32 - loss: 20.1490 2650/2680 = 98.9%, SPS: 0.1, ELAP: 8:18:39, ETA: 5:39 - loss: 22.5202 2660/2680 = 99.3%, SPS: 0.1, ELAP: 8:20:33, ETA: 3:46 - loss: 15.2829 2670/2680 = 99.6%, SPS: 0.1, ELAP: 8:22:26, ETA: 1:53 - loss: 23.1243 2680/2680 = 100.0%, SPS: 0.1, ELAP: 8:24:20, ETA: 0 - loss: 17.6996 2680/2680 = 100.0%, SPS: 0.1, ELAP: 8:24:26, ETA: 0 ================================================================================ Run dev set evaluation: model=electra_large, trial 1/1 ================================================================================ Evaluating cola Loading dataset cola_dev Existing tfrecords not found so creating Writing example 0 of 1043 Building model... Building complete 2020-05-22 19:00:48.238901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455 pciBusID: 0000:02:00.0 2020-05-22 19:00:48.239610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties: name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455 pciBusID: 0000:03:00.0 2020-05-22 19:00:48.240279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 2 with properties: name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455 pciBusID: 0000:82:00.0 2020-05-22 19:00:48.240932: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 3 with properties: name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455 pciBusID: 0000:83:00.0 2020-05-22 19:00:48.241149: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 19:00:48.241227: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 19:00:48.241295: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 19:00:48.241361: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 19:00:48.241426: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 19:00:48.241492: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 19:00:48.241515: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2020-05-22 19:00:48.241526: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices... 2020-05-22 19:00:48.241690: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-22 19:00:48.241703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 1 2 3 2020-05-22 19:00:48.241711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N Y N N 2020-05-22 19:00:48.241718: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1: Y N N N 2020-05-22 19:00:48.241725: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 2: N N N Y 2020-05-22 19:00:48.241733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 3: N N Y N /home/spectrometer/.local/lib/python3.6/site-packages/sklearn/metrics/classification.py:543: RuntimeWarning: invalid value encountered in double_scalars mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp) cola: mcc: 0.00 - loss: 0.62 Writing results to /home/spectrometer/models/electra_large/results/cola_results.txt ================================================================================ Running on the test set and writing the predictions: model=electra_large, trial 1/1 ================================================================================ Writing out predictions for [Task(cola)] test Loading dataset cola_test Existing tfrecords not found so creating Writing example 0 of 1063 Building model... Building complete 2020-05-22 19:03:03.173720: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455 pciBusID: 0000:02:00.0 2020-05-22 19:03:03.174429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties: name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455 pciBusID: 0000:03:00.0 2020-05-22 19:03:03.175120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 2 with properties: name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455 pciBusID: 0000:82:00.0 2020-05-22 19:03:03.175819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 3 with properties: name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455 pciBusID: 0000:83:00.0 2020-05-22 19:03:03.176023: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 19:03:03.176115: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 19:03:03.176188: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 19:03:03.176261: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 19:03:03.176334: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 19:03:03.176405: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/lib64 2020-05-22 19:03:03.176432: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2020-05-22 19:03:03.176443: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices... 2020-05-22 19:03:03.176603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-05-22 19:03:03.176616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 1 2 3 2020-05-22 19:03:03.176624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N Y N N 2020-05-22 19:03:03.176631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1: Y N N N 2020-05-22 19:03:03.176638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 2: N N N Y 2020-05-22 19:03:03.176645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 3: N N Y N Pickling predictions for 1063 cola examples (test)
@spectrometerH , hi, are you finetuning with electra-large model on CoLA dataset? I tried it on a 12-GB GPU, but an OOM error always raises. Could you tell me how did you get this work? Many thanks!
I use
to reproduce Electra-Large's result on GLEU. But the loss is always around 20 during fine-tuning.
But when I try Electra-Base using
the loss decreased normally as the result is OK.
The pre-train models are all downloaded from released models.
The output is attached below for your interest