dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.56k stars 538 forks source link

BERT fine-tuning on SQuAD 1.1 doesn't converge #1204

Closed TaoLv closed 4 years ago

TaoLv commented 4 years ago

Description

NFO:gluonnlp:10:28:18 Namespace(accumulate=None, batch_size=12, bert_dataset='book_corpus_wiki_en_uncased', bert_model='bert_12_768_12', calib_mode='customize', comm_backend=None, debug=False, deploy=False, doc_stride=128, dtype='float32', epochs=2, gpu=True, log_interval=50, lr=3e-05, max_answer_length=30, max_query_length=64, max_seq_length=384, model_parameters=None, model_prefix=None, n_best_size=20, null_score_diff_threshold=0.0, num_calib_batches=10, only_calibration=False, only_predict=False, optimizer='adam', output_dir='./output_dir', pretrained_bert_parameters=None, quantized_dtype='auto', round_to=None, sentencepiece=None, test_batch_size=24, training_steps=None, uncased=True, version_2=False, warmup_ratio=0.1)
INFO:gluonnlp:10:28:25 Loading train data...
INFO:gluonnlp:10:28:26 Number of records in Train data:87599
INFO:gluonnlp:10:29:04 The number of examples after preprocessing:88641
Done! Transform dataset costs 37.85 seconds.
INFO:gluonnlp:10:29:04 Start Training
INFO:gluonnlp:10:29:19 Batch: 49/7387, Loss=5.7366, lr=0.0000010 Thoughput=41.72 samples/s
INFO:gluonnlp:10:29:32 Batch: 99/7387, Loss=5.6931, lr=0.0000020 Thoughput=45.47 samples/s
INFO:gluonnlp:10:29:44 Batch: 149/7387, Loss=5.4705, lr=0.0000030 Thoughput=48.23 samples/s
INFO:gluonnlp:10:29:57 Batch: 199/7387, Loss=5.3252, lr=0.0000041 Thoughput=47.67 samples/s
INFO:gluonnlp:10:30:10 Batch: 249/7387, Loss=5.0779, lr=0.0000051 Thoughput=46.84 samples/s
INFO:gluonnlp:10:30:22 Batch: 299/7387, Loss=4.8238, lr=0.0000061 Thoughput=47.40 samples/s
INFO:gluonnlp:10:30:35 Batch: 349/7387, Loss=4.5862, lr=0.0000071 Thoughput=46.46 samples/s
INFO:gluonnlp:10:30:48 Batch: 399/7387, Loss=4.2767, lr=0.0000081 Thoughput=47.08 samples/s
INFO:gluonnlp:10:31:01 Batch: 449/7387, Loss=4.1058, lr=0.0000091 Thoughput=46.85 samples/s
INFO:gluonnlp:10:31:13 Batch: 499/7387, Loss=4.0107, lr=0.0000102 Thoughput=48.32 samples/s
INFO:gluonnlp:10:31:26 Batch: 549/7387, Loss=3.8868, lr=0.0000112 Thoughput=48.08 samples/s
INFO:gluonnlp:10:31:39 Batch: 599/7387, Loss=3.7927, lr=0.0000122 Thoughput=46.52 samples/s
INFO:gluonnlp:10:31:51 Batch: 649/7387, Loss=3.6045, lr=0.0000132 Thoughput=48.48 samples/s
INFO:gluonnlp:10:32:04 Batch: 699/7387, Loss=3.3329, lr=0.0000142 Thoughput=45.18 samples/s
INFO:gluonnlp:10:32:18 Batch: 749/7387, Loss=3.1370, lr=0.0000152 Thoughput=43.95 samples/s
INFO:gluonnlp:10:32:31 Batch: 799/7387, Loss=2.9287, lr=0.0000162 Thoughput=45.27 samples/s
INFO:gluonnlp:10:32:45 Batch: 849/7387, Loss=2.9110, lr=0.0000173 Thoughput=44.93 samples/s
INFO:gluonnlp:10:32:59 Batch: 899/7387, Loss=2.7632, lr=0.0000183 Thoughput=41.90 samples/s
INFO:gluonnlp:10:33:12 Batch: 949/7387, Loss=2.7786, lr=0.0000193 Thoughput=45.55 samples/s
INFO:gluonnlp:10:33:25 Batch: 999/7387, Loss=2.8667, lr=0.0000203 Thoughput=45.36 samples/s
INFO:gluonnlp:10:33:39 Batch: 1049/7387, Loss=2.7649, lr=0.0000213 Thoughput=43.45 samples/s
INFO:gluonnlp:10:33:52 Batch: 1099/7387, Loss=2.7784, lr=0.0000223 Thoughput=46.00 samples/s
INFO:gluonnlp:10:34:06 Batch: 1149/7387, Loss=2.6492, lr=0.0000234 Thoughput=44.58 samples/s
INFO:gluonnlp:10:34:19 Batch: 1199/7387, Loss=2.7466, lr=0.0000244 Thoughput=43.60 samples/s
INFO:gluonnlp:10:34:33 Batch: 1249/7387, Loss=2.7283, lr=0.0000254 Thoughput=44.27 samples/s
INFO:gluonnlp:10:34:47 Batch: 1299/7387, Loss=2.6654, lr=0.0000264 Thoughput=43.96 samples/s
INFO:gluonnlp:10:35:00 Batch: 1349/7387, Loss=2.7920, lr=0.0000274 Thoughput=44.62 samples/s
INFO:gluonnlp:10:35:14 Batch: 1399/7387, Loss=2.8768, lr=0.0000284 Thoughput=43.34 samples/s
INFO:gluonnlp:10:35:28 Batch: 1449/7387, Loss=2.8342, lr=0.0000295 Thoughput=41.64 samples/s
INFO:gluonnlp:10:35:42 Batch: 1499/7387, Loss=2.9186, lr=0.0000299 Thoughput=44.33 samples/s
INFO:gluonnlp:10:35:56 Batch: 1549/7387, Loss=2.8817, lr=0.0000298 Thoughput=41.69 samples/s
INFO:gluonnlp:10:36:10 Batch: 1599/7387, Loss=3.0527, lr=0.0000297 Thoughput=44.38 samples/s
INFO:gluonnlp:10:36:23 Batch: 1649/7387, Loss=3.1255, lr=0.0000296 Thoughput=44.45 samples/s
INFO:gluonnlp:10:36:37 Batch: 1699/7387, Loss=3.1396, lr=0.0000295 Thoughput=43.95 samples/s
INFO:gluonnlp:10:36:51 Batch: 1749/7387, Loss=3.3614, lr=0.0000294 Thoughput=42.61 samples/s
INFO:gluonnlp:10:37:05 Batch: 1799/7387, Loss=3.3383, lr=0.0000293 Thoughput=42.46 samples/s
INFO:gluonnlp:10:37:19 Batch: 1849/7387, Loss=3.4652, lr=0.0000292 Thoughput=44.64 samples/s
INFO:gluonnlp:10:37:32 Batch: 1899/7387, Loss=3.3199, lr=0.0000290 Thoughput=44.89 samples/s
INFO:gluonnlp:10:37:46 Batch: 1949/7387, Loss=3.5690, lr=0.0000289 Thoughput=41.94 samples/s
INFO:gluonnlp:10:38:00 Batch: 1999/7387, Loss=3.5586, lr=0.0000288 Thoughput=44.53 samples/s
INFO:gluonnlp:10:38:14 Batch: 2049/7387, Loss=3.7966, lr=0.0000287 Thoughput=42.84 samples/s
INFO:gluonnlp:10:38:27 Batch: 2099/7387, Loss=3.5441, lr=0.0000286 Thoughput=45.42 samples/s
INFO:gluonnlp:10:38:41 Batch: 2149/7387, Loss=3.7021, lr=0.0000285 Thoughput=43.08 samples/s
INFO:gluonnlp:10:38:54 Batch: 2199/7387, Loss=3.7285, lr=0.0000284 Thoughput=44.25 samples/s
INFO:gluonnlp:10:39:08 Batch: 2249/7387, Loss=3.8293, lr=0.0000283 Thoughput=43.60 samples/s
INFO:gluonnlp:10:39:22 Batch: 2299/7387, Loss=3.8185, lr=0.0000281 Thoughput=44.31 samples/s
INFO:gluonnlp:10:39:35 Batch: 2349/7387, Loss=3.8284, lr=0.0000280 Thoughput=44.55 samples/s
INFO:gluonnlp:10:39:49 Batch: 2399/7387, Loss=4.0185, lr=0.0000279 Thoughput=42.95 samples/s
INFO:gluonnlp:10:40:04 Batch: 2449/7387, Loss=3.9258, lr=0.0000278 Thoughput=41.74 samples/s
INFO:gluonnlp:10:40:17 Batch: 2499/7387, Loss=3.9495, lr=0.0000277 Thoughput=45.10 samples/s
INFO:gluonnlp:10:40:31 Batch: 2549/7387, Loss=3.9302, lr=0.0000276 Thoughput=43.96 samples/s
INFO:gluonnlp:10:40:44 Batch: 2599/7387, Loss=3.9502, lr=0.0000275 Thoughput=45.95 samples/s
INFO:gluonnlp:10:40:57 Batch: 2649/7387, Loss=3.9685, lr=0.0000274 Thoughput=44.44 samples/s
INFO:gluonnlp:10:41:10 Batch: 2699/7387, Loss=4.0349, lr=0.0000272 Thoughput=45.72 samples/s
INFO:gluonnlp:10:41:24 Batch: 2749/7387, Loss=4.0281, lr=0.0000271 Thoughput=44.21 samples/s
INFO:gluonnlp:10:41:37 Batch: 2799/7387, Loss=3.9801, lr=0.0000270 Thoughput=44.92 samples/s
INFO:gluonnlp:10:41:50 Batch: 2849/7387, Loss=3.9873, lr=0.0000269 Thoughput=45.49 samples/s
INFO:gluonnlp:10:42:04 Batch: 2899/7387, Loss=4.1328, lr=0.0000268 Thoughput=43.60 samples/s
INFO:gluonnlp:10:42:17 Batch: 2949/7387, Loss=4.0381, lr=0.0000267 Thoughput=46.05 samples/s
INFO:gluonnlp:10:42:30 Batch: 2999/7387, Loss=3.9962, lr=0.0000266 Thoughput=45.34 samples/s
INFO:gluonnlp:10:42:44 Batch: 3049/7387, Loss=4.0731, lr=0.0000265 Thoughput=44.03 samples/s
INFO:gluonnlp:10:42:57 Batch: 3099/7387, Loss=4.1017, lr=0.0000263 Thoughput=44.74 samples/s
INFO:gluonnlp:10:43:11 Batch: 3149/7387, Loss=3.9907, lr=0.0000262 Thoughput=44.34 samples/s
INFO:gluonnlp:10:43:24 Batch: 3199/7387, Loss=4.0384, lr=0.0000261 Thoughput=45.02 samples/s
INFO:gluonnlp:10:43:38 Batch: 3249/7387, Loss=4.0809, lr=0.0000260 Thoughput=42.78 samples/s
INFO:gluonnlp:10:43:52 Batch: 3299/7387, Loss=3.9960, lr=0.0000259 Thoughput=44.90 samples/s
INFO:gluonnlp:10:44:05 Batch: 3349/7387, Loss=4.0318, lr=0.0000258 Thoughput=44.17 samples/s
INFO:gluonnlp:10:44:19 Batch: 3399/7387, Loss=4.0101, lr=0.0000257 Thoughput=44.25 samples/s
INFO:gluonnlp:10:44:33 Batch: 3449/7387, Loss=4.0862, lr=0.0000255 Thoughput=43.34 samples/s
INFO:gluonnlp:10:44:46 Batch: 3499/7387, Loss=4.0828, lr=0.0000254 Thoughput=43.57 samples/s
INFO:gluonnlp:10:45:00 Batch: 3549/7387, Loss=4.1510, lr=0.0000253 Thoughput=45.25 samples/s
INFO:gluonnlp:10:45:12 Batch: 3599/7387, Loss=4.0614, lr=0.0000252 Thoughput=46.77 samples/s
INFO:gluonnlp:10:45:27 Batch: 3649/7387, Loss=4.0762, lr=0.0000251 Thoughput=41.75 samples/s
INFO:gluonnlp:10:45:40 Batch: 3699/7387, Loss=4.0421, lr=0.0000250 Thoughput=45.71 samples/s
INFO:gluonnlp:10:45:53 Batch: 3749/7387, Loss=4.1429, lr=0.0000249 Thoughput=44.66 samples/s
INFO:gluonnlp:10:46:07 Batch: 3799/7387, Loss=4.0960, lr=0.0000248 Thoughput=45.11 samples/s
INFO:gluonnlp:10:46:21 Batch: 3849/7387, Loss=4.1295, lr=0.0000246 Thoughput=40.82 samples/s
INFO:gluonnlp:10:46:35 Batch: 3899/7387, Loss=4.0449, lr=0.0000245 Thoughput=45.64 samples/s
INFO:gluonnlp:10:46:48 Batch: 3949/7387, Loss=4.0744, lr=0.0000244 Thoughput=43.96 samples/s
INFO:gluonnlp:10:47:01 Batch: 3999/7387, Loss=4.0502, lr=0.0000243 Thoughput=45.63 samples/s
INFO:gluonnlp:10:47:15 Batch: 4049/7387, Loss=4.1235, lr=0.0000242 Thoughput=45.42 samples/s
INFO:gluonnlp:10:47:28 Batch: 4099/7387, Loss=4.1194, lr=0.0000241 Thoughput=44.43 samples/s
INFO:gluonnlp:10:47:42 Batch: 4149/7387, Loss=4.1620, lr=0.0000240 Thoughput=43.85 samples/s
INFO:gluonnlp:10:47:56 Batch: 4199/7387, Loss=4.1474, lr=0.0000239 Thoughput=42.49 samples/s
INFO:gluonnlp:10:48:09 Batch: 4249/7387, Loss=4.1311, lr=0.0000237 Thoughput=44.33 samples/s
INFO:gluonnlp:10:48:23 Batch: 4299/7387, Loss=4.1637, lr=0.0000236 Thoughput=44.21 samples/s
INFO:gluonnlp:10:48:36 Batch: 4349/7387, Loss=4.1676, lr=0.0000235 Thoughput=44.64 samples/s
INFO:gluonnlp:10:48:50 Batch: 4399/7387, Loss=4.0311, lr=0.0000234 Thoughput=45.67 samples/s
INFO:gluonnlp:10:49:03 Batch: 4449/7387, Loss=4.1035, lr=0.0000233 Thoughput=45.05 samples/s
INFO:gluonnlp:10:49:17 Batch: 4499/7387, Loss=4.1394, lr=0.0000232 Thoughput=43.84 samples/s
INFO:gluonnlp:10:49:30 Batch: 4549/7387, Loss=4.1112, lr=0.0000231 Thoughput=45.08 samples/s
INFO:gluonnlp:10:49:43 Batch: 4599/7387, Loss=4.1363, lr=0.0000230 Thoughput=44.49 samples/s
INFO:gluonnlp:10:49:57 Batch: 4649/7387, Loss=4.0765, lr=0.0000228 Thoughput=44.56 samples/s
INFO:gluonnlp:10:50:10 Batch: 4699/7387, Loss=4.1965, lr=0.0000227 Thoughput=45.77 samples/s
INFO:gluonnlp:10:50:24 Batch: 4749/7387, Loss=4.1263, lr=0.0000226 Thoughput=42.68 samples/s
INFO:gluonnlp:10:50:37 Batch: 4799/7387, Loss=4.0919, lr=0.0000225 Thoughput=46.86 samples/s
INFO:gluonnlp:10:50:51 Batch: 4849/7387, Loss=4.2508, lr=0.0000224 Thoughput=43.28 samples/s
INFO:gluonnlp:10:51:05 Batch: 4899/7387, Loss=4.2040, lr=0.0000223 Thoughput=42.47 samples/s
INFO:gluonnlp:10:51:18 Batch: 4949/7387, Loss=4.1226, lr=0.0000222 Thoughput=46.83 samples/s
INFO:gluonnlp:10:51:31 Batch: 4999/7387, Loss=4.1356, lr=0.0000221 Thoughput=44.62 samples/s
INFO:gluonnlp:10:51:45 Batch: 5049/7387, Loss=4.1283, lr=0.0000219 Thoughput=43.60 samples/s
INFO:gluonnlp:10:51:58 Batch: 5099/7387, Loss=4.2041, lr=0.0000218 Thoughput=44.99 samples/s
INFO:gluonnlp:10:52:12 Batch: 5149/7387, Loss=4.1447, lr=0.0000217 Thoughput=44.54 samples/s
INFO:gluonnlp:10:52:25 Batch: 5199/7387, Loss=4.1981, lr=0.0000216 Thoughput=44.35 samples/s
INFO:gluonnlp:10:52:39 Batch: 5249/7387, Loss=4.1360, lr=0.0000215 Thoughput=44.51 samples/s
INFO:gluonnlp:10:52:52 Batch: 5299/7387, Loss=4.2430, lr=0.0000214 Thoughput=43.69 samples/s
INFO:gluonnlp:10:53:06 Batch: 5349/7387, Loss=4.1508, lr=0.0000213 Thoughput=44.36 samples/s
INFO:gluonnlp:10:53:19 Batch: 5399/7387, Loss=4.0919, lr=0.0000211 Thoughput=44.89 samples/s
INFO:gluonnlp:10:53:33 Batch: 5449/7387, Loss=4.2194, lr=0.0000210 Thoughput=44.67 samples/s
INFO:gluonnlp:10:53:46 Batch: 5499/7387, Loss=4.1684, lr=0.0000209 Thoughput=45.08 samples/s
INFO:gluonnlp:10:53:59 Batch: 5549/7387, Loss=4.1586, lr=0.0000208 Thoughput=44.64 samples/s
INFO:gluonnlp:10:54:13 Batch: 5599/7387, Loss=4.1701, lr=0.0000207 Thoughput=44.25 samples/s
INFO:gluonnlp:10:54:26 Batch: 5649/7387, Loss=4.2287, lr=0.0000206 Thoughput=45.33 samples/s
INFO:gluonnlp:10:54:40 Batch: 5699/7387, Loss=4.1584, lr=0.0000205 Thoughput=44.12 samples/s
INFO:gluonnlp:10:54:53 Batch: 5749/7387, Loss=4.1876, lr=0.0000204 Thoughput=45.88 samples/s
INFO:gluonnlp:10:55:06 Batch: 5799/7387, Loss=4.2106, lr=0.0000202 Thoughput=45.62 samples/s
INFO:gluonnlp:10:55:21 Batch: 5849/7387, Loss=4.1353, lr=0.0000201 Thoughput=39.74 samples/s
INFO:gluonnlp:10:55:34 Batch: 5899/7387, Loss=4.1966, lr=0.0000200 Thoughput=45.22 samples/s
INFO:gluonnlp:10:55:48 Batch: 5949/7387, Loss=4.1623, lr=0.0000199 Thoughput=44.90 samples/s
INFO:gluonnlp:10:56:01 Batch: 5999/7387, Loss=4.1149, lr=0.0000198 Thoughput=46.68 samples/s
INFO:gluonnlp:10:56:14 Batch: 6049/7387, Loss=4.1312, lr=0.0000197 Thoughput=44.63 samples/s
INFO:gluonnlp:10:56:27 Batch: 6099/7387, Loss=4.1174, lr=0.0000196 Thoughput=46.13 samples/s
INFO:gluonnlp:10:56:40 Batch: 6149/7387, Loss=4.1856, lr=0.0000195 Thoughput=45.87 samples/s
INFO:gluonnlp:10:56:53 Batch: 6199/7387, Loss=4.0843, lr=0.0000193 Thoughput=47.09 samples/s
INFO:gluonnlp:10:57:07 Batch: 6249/7387, Loss=4.1435, lr=0.0000192 Thoughput=43.75 samples/s
INFO:gluonnlp:10:57:20 Batch: 6299/7387, Loss=4.1400, lr=0.0000191 Thoughput=44.17 samples/s
INFO:gluonnlp:10:57:34 Batch: 6349/7387, Loss=4.1790, lr=0.0000190 Thoughput=42.81 samples/s
INFO:gluonnlp:10:57:48 Batch: 6399/7387, Loss=4.1604, lr=0.0000189 Thoughput=44.76 samples/s
INFO:gluonnlp:10:58:01 Batch: 6449/7387, Loss=4.1715, lr=0.0000188 Thoughput=44.97 samples/s
INFO:gluonnlp:10:58:14 Batch: 6499/7387, Loss=4.0780, lr=0.0000187 Thoughput=44.98 samples/s
INFO:gluonnlp:10:58:28 Batch: 6549/7387, Loss=4.2045, lr=0.0000186 Thoughput=45.08 samples/s
INFO:gluonnlp:10:58:42 Batch: 6599/7387, Loss=4.0806, lr=0.0000184 Thoughput=41.89 samples/s
INFO:gluonnlp:10:58:56 Batch: 6649/7387, Loss=4.1759, lr=0.0000183 Thoughput=41.65 samples/s
INFO:gluonnlp:10:59:10 Batch: 6699/7387, Loss=4.1486, lr=0.0000182 Thoughput=43.76 samples/s
INFO:gluonnlp:10:59:23 Batch: 6749/7387, Loss=4.1671, lr=0.0000181 Thoughput=44.89 samples/s
INFO:gluonnlp:10:59:36 Batch: 6799/7387, Loss=4.1012, lr=0.0000180 Thoughput=46.14 samples/s
INFO:gluonnlp:10:59:50 Batch: 6849/7387, Loss=4.1678, lr=0.0000179 Thoughput=45.07 samples/s
INFO:gluonnlp:11:00:03 Batch: 6899/7387, Loss=4.1457, lr=0.0000178 Thoughput=45.28 samples/s
INFO:gluonnlp:11:00:17 Batch: 6949/7387, Loss=4.2047, lr=0.0000177 Thoughput=42.11 samples/s
INFO:gluonnlp:11:00:30 Batch: 6999/7387, Loss=4.1246, lr=0.0000175 Thoughput=46.11 samples/s
INFO:gluonnlp:11:00:44 Batch: 7049/7387, Loss=4.1637, lr=0.0000174 Thoughput=45.06 samples/s
INFO:gluonnlp:11:00:57 Batch: 7099/7387, Loss=4.2105, lr=0.0000173 Thoughput=43.85 samples/s
INFO:gluonnlp:11:01:10 Batch: 7149/7387, Loss=4.0982, lr=0.0000172 Thoughput=46.40 samples/s
INFO:gluonnlp:11:01:23 Batch: 7199/7387, Loss=4.2186, lr=0.0000171 Thoughput=47.99 samples/s
INFO:gluonnlp:11:01:36 Batch: 7249/7387, Loss=4.1285, lr=0.0000170 Thoughput=45.24 samples/s
INFO:gluonnlp:11:01:50 Batch: 7299/7387, Loss=4.1594, lr=0.0000169 Thoughput=44.19 samples/s
INFO:gluonnlp:11:02:03 Batch: 7349/7387, Loss=4.0565, lr=0.0000167 Thoughput=45.46 samples/s
INFO:gluonnlp:11:02:12 Finish training step: 7387
INFO:gluonnlp:11:02:12 Time cost=1987.69 s, Thoughput=44.60 samples/s
INFO:gluonnlp:11:02:16 Batch: 12/7387, Loss=4.0776, lr=0.0000166 Thoughput=45.75 samples/s
INFO:gluonnlp:11:02:30 Batch: 62/7387, Loss=4.0591, lr=0.0000165 Thoughput=43.75 samples/s
INFO:gluonnlp:11:02:43 Batch: 112/7387, Loss=4.0958, lr=0.0000164 Thoughput=43.51 samples/s
INFO:gluonnlp:11:02:56 Batch: 162/7387, Loss=4.0935, lr=0.0000163 Thoughput=45.73 samples/s
INFO:gluonnlp:11:03:10 Batch: 212/7387, Loss=4.1058, lr=0.0000162 Thoughput=43.64 samples/s
INFO:gluonnlp:11:03:24 Batch: 262/7387, Loss=4.1321, lr=0.0000161 Thoughput=44.69 samples/s
INFO:gluonnlp:11:03:37 Batch: 312/7387, Loss=4.0636, lr=0.0000160 Thoughput=45.16 samples/s
INFO:gluonnlp:11:03:51 Batch: 362/7387, Loss=4.1577, lr=0.0000158 Thoughput=41.34 samples/s
INFO:gluonnlp:11:04:05 Batch: 412/7387, Loss=4.1470, lr=0.0000157 Thoughput=43.45 samples/s
INFO:gluonnlp:11:04:18 Batch: 462/7387, Loss=4.1421, lr=0.0000156 Thoughput=46.59 samples/s
INFO:gluonnlp:11:04:31 Batch: 512/7387, Loss=4.0959, lr=0.0000155 Thoughput=46.75 samples/s
INFO:gluonnlp:11:04:44 Batch: 562/7387, Loss=4.1666, lr=0.0000154 Thoughput=46.41 samples/s
INFO:gluonnlp:11:04:58 Batch: 612/7387, Loss=4.1476, lr=0.0000153 Thoughput=43.52 samples/s
INFO:gluonnlp:11:05:11 Batch: 662/7387, Loss=4.1270, lr=0.0000152 Thoughput=45.24 samples/s
INFO:gluonnlp:11:05:25 Batch: 712/7387, Loss=4.1495, lr=0.0000151 Thoughput=42.43 samples/s
INFO:gluonnlp:11:05:39 Batch: 762/7387, Loss=4.1315, lr=0.0000149 Thoughput=42.86 samples/s
INFO:gluonnlp:11:05:52 Batch: 812/7387, Loss=4.2071, lr=0.0000148 Thoughput=45.86 samples/s
INFO:gluonnlp:11:06:05 Batch: 862/7387, Loss=4.0890, lr=0.0000147 Thoughput=46.23 samples/s
INFO:gluonnlp:11:06:19 Batch: 912/7387, Loss=4.1006, lr=0.0000146 Thoughput=44.46 samples/s
INFO:gluonnlp:11:06:32 Batch: 962/7387, Loss=4.0869, lr=0.0000145 Thoughput=45.78 samples/s
INFO:gluonnlp:11:06:45 Batch: 1012/7387, Loss=4.0929, lr=0.0000144 Thoughput=46.82 samples/s
INFO:gluonnlp:11:06:58 Batch: 1062/7387, Loss=4.0652, lr=0.0000143 Thoughput=45.73 samples/s
INFO:gluonnlp:11:07:11 Batch: 1112/7387, Loss=4.1306, lr=0.0000142 Thoughput=43.72 samples/s
INFO:gluonnlp:11:07:25 Batch: 1162/7387, Loss=4.1885, lr=0.0000140 Thoughput=44.05 samples/s
INFO:gluonnlp:11:07:39 Batch: 1212/7387, Loss=4.1907, lr=0.0000139 Thoughput=42.50 samples/s
INFO:gluonnlp:11:07:53 Batch: 1262/7387, Loss=4.0502, lr=0.0000138 Thoughput=42.38 samples/s
INFO:gluonnlp:11:08:07 Batch: 1312/7387, Loss=4.1771, lr=0.0000137 Thoughput=43.80 samples/s
INFO:gluonnlp:11:08:21 Batch: 1362/7387, Loss=4.1444, lr=0.0000136 Thoughput=43.93 samples/s
INFO:gluonnlp:11:08:34 Batch: 1412/7387, Loss=4.1231, lr=0.0000135 Thoughput=46.40 samples/s
INFO:gluonnlp:11:08:47 Batch: 1462/7387, Loss=4.1259, lr=0.0000134 Thoughput=44.89 samples/s
INFO:gluonnlp:11:09:01 Batch: 1512/7387, Loss=4.0674, lr=0.0000133 Thoughput=43.95 samples/s
INFO:gluonnlp:11:09:14 Batch: 1562/7387, Loss=4.1281, lr=0.0000131 Thoughput=44.46 samples/s
INFO:gluonnlp:11:09:28 Batch: 1612/7387, Loss=4.0987, lr=0.0000130 Thoughput=44.48 samples/s
INFO:gluonnlp:11:09:42 Batch: 1662/7387, Loss=4.2103, lr=0.0000129 Thoughput=41.33 samples/s
INFO:gluonnlp:11:09:55 Batch: 1712/7387, Loss=4.0690, lr=0.0000128 Thoughput=47.13 samples/s
INFO:gluonnlp:11:10:09 Batch: 1762/7387, Loss=4.1266, lr=0.0000127 Thoughput=43.28 samples/s
INFO:gluonnlp:11:10:22 Batch: 1812/7387, Loss=4.1022, lr=0.0000126 Thoughput=46.63 samples/s
INFO:gluonnlp:11:10:35 Batch: 1862/7387, Loss=4.0887, lr=0.0000125 Thoughput=43.36 samples/s
INFO:gluonnlp:11:10:49 Batch: 1912/7387, Loss=4.1930, lr=0.0000123 Thoughput=43.75 samples/s
INFO:gluonnlp:11:11:02 Batch: 1962/7387, Loss=4.0607, lr=0.0000122 Thoughput=45.23 samples/s
INFO:gluonnlp:11:11:16 Batch: 2012/7387, Loss=4.1371, lr=0.0000121 Thoughput=43.70 samples/s
INFO:gluonnlp:11:11:30 Batch: 2062/7387, Loss=4.1286, lr=0.0000120 Thoughput=43.74 samples/s
INFO:gluonnlp:11:11:43 Batch: 2112/7387, Loss=4.0778, lr=0.0000119 Thoughput=44.38 samples/s
INFO:gluonnlp:11:11:57 Batch: 2162/7387, Loss=4.0880, lr=0.0000118 Thoughput=44.37 samples/s
INFO:gluonnlp:11:12:10 Batch: 2212/7387, Loss=4.1034, lr=0.0000117 Thoughput=44.49 samples/s
INFO:gluonnlp:11:12:24 Batch: 2262/7387, Loss=4.1207, lr=0.0000116 Thoughput=44.37 samples/s
INFO:gluonnlp:11:12:37 Batch: 2312/7387, Loss=4.0498, lr=0.0000114 Thoughput=45.33 samples/s
INFO:gluonnlp:11:12:50 Batch: 2362/7387, Loss=4.1283, lr=0.0000113 Thoughput=45.50 samples/s
INFO:gluonnlp:11:13:04 Batch: 2412/7387, Loss=4.0925, lr=0.0000112 Thoughput=43.62 samples/s
INFO:gluonnlp:11:13:18 Batch: 2462/7387, Loss=4.1359, lr=0.0000111 Thoughput=43.71 samples/s
INFO:gluonnlp:11:13:31 Batch: 2512/7387, Loss=4.1376, lr=0.0000110 Thoughput=43.78 samples/s
INFO:gluonnlp:11:13:45 Batch: 2562/7387, Loss=4.1093, lr=0.0000109 Thoughput=44.49 samples/s
INFO:gluonnlp:11:13:58 Batch: 2612/7387, Loss=4.2317, lr=0.0000108 Thoughput=45.39 samples/s
INFO:gluonnlp:11:14:12 Batch: 2662/7387, Loss=4.1057, lr=0.0000107 Thoughput=44.15 samples/s
INFO:gluonnlp:11:14:25 Batch: 2712/7387, Loss=4.1259, lr=0.0000105 Thoughput=44.01 samples/s
INFO:gluonnlp:11:14:40 Batch: 2762/7387, Loss=4.1273, lr=0.0000104 Thoughput=42.33 samples/s
INFO:gluonnlp:11:14:53 Batch: 2812/7387, Loss=4.1457, lr=0.0000103 Thoughput=43.42 samples/s
INFO:gluonnlp:11:15:07 Batch: 2862/7387, Loss=4.0793, lr=0.0000102 Thoughput=45.70 samples/s
INFO:gluonnlp:11:15:19 Batch: 2912/7387, Loss=4.0774, lr=0.0000101 Thoughput=46.38 samples/s
INFO:gluonnlp:11:15:34 Batch: 2962/7387, Loss=4.1106, lr=0.0000100 Thoughput=42.38 samples/s
INFO:gluonnlp:11:15:47 Batch: 3012/7387, Loss=4.1557, lr=0.0000099 Thoughput=45.78 samples/s
INFO:gluonnlp:11:16:00 Batch: 3062/7387, Loss=4.1707, lr=0.0000098 Thoughput=43.96 samples/s
INFO:gluonnlp:11:16:14 Batch: 3112/7387, Loss=4.1613, lr=0.0000096 Thoughput=44.49 samples/s
INFO:gluonnlp:11:16:27 Batch: 3162/7387, Loss=4.1119, lr=0.0000095 Thoughput=46.41 samples/s
INFO:gluonnlp:11:16:40 Batch: 3212/7387, Loss=4.1192, lr=0.0000094 Thoughput=44.79 samples/s
INFO:gluonnlp:11:16:54 Batch: 3262/7387, Loss=4.0926, lr=0.0000093 Thoughput=43.16 samples/s
INFO:gluonnlp:11:17:07 Batch: 3312/7387, Loss=4.0888, lr=0.0000092 Thoughput=45.24 samples/s
INFO:gluonnlp:11:17:21 Batch: 3362/7387, Loss=4.0483, lr=0.0000091 Thoughput=43.18 samples/s
INFO:gluonnlp:11:17:35 Batch: 3412/7387, Loss=4.1036, lr=0.0000090 Thoughput=44.04 samples/s
INFO:gluonnlp:11:17:48 Batch: 3462/7387, Loss=4.1874, lr=0.0000089 Thoughput=44.49 samples/s
INFO:gluonnlp:11:18:01 Batch: 3512/7387, Loss=4.0687, lr=0.0000087 Thoughput=46.69 samples/s
INFO:gluonnlp:11:18:15 Batch: 3562/7387, Loss=4.0894, lr=0.0000086 Thoughput=44.96 samples/s
INFO:gluonnlp:11:18:28 Batch: 3612/7387, Loss=4.1437, lr=0.0000085 Thoughput=43.31 samples/s
INFO:gluonnlp:11:18:43 Batch: 3662/7387, Loss=4.0851, lr=0.0000084 Thoughput=41.41 samples/s
INFO:gluonnlp:11:18:55 Batch: 3712/7387, Loss=4.1065, lr=0.0000083 Thoughput=48.17 samples/s
INFO:gluonnlp:11:19:09 Batch: 3762/7387, Loss=4.1050, lr=0.0000082 Thoughput=45.42 samples/s
INFO:gluonnlp:11:19:21 Batch: 3812/7387, Loss=4.0368, lr=0.0000081 Thoughput=46.93 samples/s
INFO:gluonnlp:11:19:34 Batch: 3862/7387, Loss=4.0824, lr=0.0000079 Thoughput=46.24 samples/s
INFO:gluonnlp:11:19:48 Batch: 3912/7387, Loss=4.1474, lr=0.0000078 Thoughput=42.87 samples/s
INFO:gluonnlp:11:20:02 Batch: 3962/7387, Loss=4.0872, lr=0.0000077 Thoughput=44.41 samples/s
INFO:gluonnlp:11:20:15 Batch: 4012/7387, Loss=4.1012, lr=0.0000076 Thoughput=43.96 samples/s
INFO:gluonnlp:11:20:28 Batch: 4062/7387, Loss=4.1431, lr=0.0000075 Thoughput=46.53 samples/s
INFO:gluonnlp:11:20:42 Batch: 4112/7387, Loss=4.0852, lr=0.0000074 Thoughput=45.31 samples/s
INFO:gluonnlp:11:20:55 Batch: 4162/7387, Loss=4.1129, lr=0.0000073 Thoughput=43.51 samples/s
INFO:gluonnlp:11:21:08 Batch: 4212/7387, Loss=4.0370, lr=0.0000072 Thoughput=46.17 samples/s
INFO:gluonnlp:11:21:21 Batch: 4262/7387, Loss=4.1498, lr=0.0000070 Thoughput=46.75 samples/s
INFO:gluonnlp:11:21:35 Batch: 4312/7387, Loss=4.1244, lr=0.0000069 Thoughput=43.76 samples/s
INFO:gluonnlp:11:21:48 Batch: 4362/7387, Loss=4.0967, lr=0.0000068 Thoughput=45.27 samples/s
INFO:gluonnlp:11:22:01 Batch: 4412/7387, Loss=4.0400, lr=0.0000067 Thoughput=46.72 samples/s
INFO:gluonnlp:11:22:14 Batch: 4462/7387, Loss=4.1163, lr=0.0000066 Thoughput=44.57 samples/s
INFO:gluonnlp:11:22:28 Batch: 4512/7387, Loss=4.1048, lr=0.0000065 Thoughput=45.02 samples/s
INFO:gluonnlp:11:22:41 Batch: 4562/7387, Loss=4.0797, lr=0.0000064 Thoughput=45.47 samples/s
INFO:gluonnlp:11:22:55 Batch: 4612/7387, Loss=4.0614, lr=0.0000063 Thoughput=43.93 samples/s
INFO:gluonnlp:11:23:08 Batch: 4662/7387, Loss=4.1530, lr=0.0000061 Thoughput=46.30 samples/s
INFO:gluonnlp:11:23:21 Batch: 4712/7387, Loss=4.0868, lr=0.0000060 Thoughput=45.81 samples/s
INFO:gluonnlp:11:23:34 Batch: 4762/7387, Loss=4.0923, lr=0.0000059 Thoughput=44.20 samples/s
INFO:gluonnlp:11:23:47 Batch: 4812/7387, Loss=4.0161, lr=0.0000058 Thoughput=46.52 samples/s
INFO:gluonnlp:11:24:00 Batch: 4862/7387, Loss=4.0832, lr=0.0000057 Thoughput=45.71 samples/s
INFO:gluonnlp:11:24:13 Batch: 4912/7387, Loss=4.1316, lr=0.0000056 Thoughput=47.06 samples/s
INFO:gluonnlp:11:24:27 Batch: 4962/7387, Loss=4.1091, lr=0.0000055 Thoughput=44.09 samples/s
INFO:gluonnlp:11:24:40 Batch: 5012/7387, Loss=4.0840, lr=0.0000054 Thoughput=45.39 samples/s
INFO:gluonnlp:11:24:53 Batch: 5062/7387, Loss=4.1174, lr=0.0000052 Thoughput=44.24 samples/s
INFO:gluonnlp:11:25:07 Batch: 5112/7387, Loss=4.0487, lr=0.0000051 Thoughput=44.90 samples/s
INFO:gluonnlp:11:25:20 Batch: 5162/7387, Loss=4.1211, lr=0.0000050 Thoughput=44.33 samples/s
INFO:gluonnlp:11:25:34 Batch: 5212/7387, Loss=4.1463, lr=0.0000049 Thoughput=45.15 samples/s
INFO:gluonnlp:11:25:47 Batch: 5262/7387, Loss=4.0731, lr=0.0000048 Thoughput=43.85 samples/s
INFO:gluonnlp:11:26:00 Batch: 5312/7387, Loss=4.1587, lr=0.0000047 Thoughput=45.70 samples/s
INFO:gluonnlp:11:26:14 Batch: 5362/7387, Loss=4.0584, lr=0.0000046 Thoughput=45.49 samples/s
INFO:gluonnlp:11:26:26 Batch: 5412/7387, Loss=4.0978, lr=0.0000045 Thoughput=47.34 samples/s
INFO:gluonnlp:11:26:40 Batch: 5462/7387, Loss=4.1306, lr=0.0000043 Thoughput=44.27 samples/s
INFO:gluonnlp:11:26:53 Batch: 5512/7387, Loss=4.0910, lr=0.0000042 Thoughput=44.97 samples/s
INFO:gluonnlp:11:27:06 Batch: 5562/7387, Loss=4.0427, lr=0.0000041 Thoughput=46.23 samples/s
INFO:gluonnlp:11:27:19 Batch: 5612/7387, Loss=4.0804, lr=0.0000040 Thoughput=45.94 samples/s
INFO:gluonnlp:11:27:33 Batch: 5662/7387, Loss=4.0750, lr=0.0000039 Thoughput=44.74 samples/s
INFO:gluonnlp:11:27:46 Batch: 5712/7387, Loss=4.1347, lr=0.0000038 Thoughput=45.34 samples/s
INFO:gluonnlp:11:27:59 Batch: 5762/7387, Loss=4.1372, lr=0.0000037 Thoughput=45.17 samples/s
INFO:gluonnlp:11:28:13 Batch: 5812/7387, Loss=4.0193, lr=0.0000035 Thoughput=43.86 samples/s
INFO:gluonnlp:11:28:27 Batch: 5862/7387, Loss=4.2086, lr=0.0000034 Thoughput=43.70 samples/s
INFO:gluonnlp:11:28:40 Batch: 5912/7387, Loss=4.1591, lr=0.0000033 Thoughput=45.01 samples/s
INFO:gluonnlp:11:28:54 Batch: 5962/7387, Loss=4.1108, lr=0.0000032 Thoughput=44.07 samples/s
INFO:gluonnlp:11:29:07 Batch: 6012/7387, Loss=4.1279, lr=0.0000031 Thoughput=46.01 samples/s
INFO:gluonnlp:11:29:20 Batch: 6062/7387, Loss=4.1495, lr=0.0000030 Thoughput=44.55 samples/s
INFO:gluonnlp:11:29:33 Batch: 6112/7387, Loss=4.1213, lr=0.0000029 Thoughput=44.86 samples/s
INFO:gluonnlp:11:29:46 Batch: 6162/7387, Loss=4.0740, lr=0.0000028 Thoughput=46.45 samples/s
INFO:gluonnlp:11:30:00 Batch: 6212/7387, Loss=4.1713, lr=0.0000026 Thoughput=43.52 samples/s
INFO:gluonnlp:11:30:14 Batch: 6262/7387, Loss=4.1564, lr=0.0000025 Thoughput=43.98 samples/s
INFO:gluonnlp:11:30:27 Batch: 6312/7387, Loss=4.0892, lr=0.0000024 Thoughput=45.32 samples/s
INFO:gluonnlp:11:30:40 Batch: 6362/7387, Loss=4.0660, lr=0.0000023 Thoughput=46.31 samples/s
INFO:gluonnlp:11:30:53 Batch: 6412/7387, Loss=4.0736, lr=0.0000022 Thoughput=46.32 samples/s
INFO:gluonnlp:11:31:06 Batch: 6462/7387, Loss=4.0536, lr=0.0000021 Thoughput=44.83 samples/s
INFO:gluonnlp:11:31:20 Batch: 6512/7387, Loss=4.2200, lr=0.0000020 Thoughput=43.86 samples/s
INFO:gluonnlp:11:31:33 Batch: 6562/7387, Loss=4.1070, lr=0.0000019 Thoughput=45.22 samples/s
INFO:gluonnlp:11:31:46 Batch: 6612/7387, Loss=4.0595, lr=0.0000017 Thoughput=46.24 samples/s
INFO:gluonnlp:11:32:00 Batch: 6662/7387, Loss=4.1146, lr=0.0000016 Thoughput=44.01 samples/s
INFO:gluonnlp:11:32:13 Batch: 6712/7387, Loss=4.1076, lr=0.0000015 Thoughput=46.55 samples/s
INFO:gluonnlp:11:32:26 Batch: 6762/7387, Loss=4.1171, lr=0.0000014 Thoughput=46.89 samples/s
INFO:gluonnlp:11:32:39 Batch: 6812/7387, Loss=4.0516, lr=0.0000013 Thoughput=45.68 samples/s
INFO:gluonnlp:11:32:52 Batch: 6862/7387, Loss=4.0758, lr=0.0000012 Thoughput=44.23 samples/s
INFO:gluonnlp:11:33:05 Batch: 6912/7387, Loss=4.0915, lr=0.0000011 Thoughput=45.50 samples/s
INFO:gluonnlp:11:33:19 Batch: 6962/7387, Loss=4.1415, lr=0.0000010 Thoughput=44.48 samples/s
INFO:gluonnlp:11:33:32 Batch: 7012/7387, Loss=4.0714, lr=0.0000008 Thoughput=46.97 samples/s
INFO:gluonnlp:11:33:45 Batch: 7062/7387, Loss=4.0710, lr=0.0000007 Thoughput=45.87 samples/s
INFO:gluonnlp:11:33:58 Batch: 7112/7387, Loss=4.0817, lr=0.0000006 Thoughput=45.07 samples/s
INFO:gluonnlp:11:34:12 Batch: 7162/7387, Loss=4.0979, lr=0.0000005 Thoughput=44.60 samples/s
INFO:gluonnlp:11:34:24 Batch: 7212/7387, Loss=4.1037, lr=0.0000004 Thoughput=46.83 samples/s
INFO:gluonnlp:11:34:39 Batch: 7262/7387, Loss=4.0831, lr=0.0000003 Thoughput=41.88 samples/s
INFO:gluonnlp:11:34:52 Batch: 7312/7387, Loss=4.0656, lr=0.0000002 Thoughput=44.57 samples/s
INFO:gluonnlp:11:35:05 Batch: 7362/7387, Loss=4.0253, lr=0.0000001 Thoughput=45.51 samples/s
INFO:gluonnlp:11:35:12 Finish training step: 14773
INFO:gluonnlp:11:35:12 Time cost=3967.29 s, Thoughput=44.68 samples/s
INFO:gluonnlp:11:35:15 Loading dev data...
INFO:gluonnlp:11:35:15 Number of records in dev data:10570
Done! Transform dataset costs 4.27 seconds.
INFO:gluonnlp:11:35:24 The number of examples after preprocessing:10833
Done! Transform dataset costs 4.35 seconds.
INFO:gluonnlp:11:35:24 start prediction
INFO:gluonnlp:11:36:25 Time cost=60.60 s, Thoughput=178.75 samples/s
INFO:gluonnlp:11:36:25 Get prediction results...
INFO:gluonnlp:11:36:54 {'exact_match': 5.931882686849574, 'f1': 13.615304984378835}

Error Message

(Paste the complete error message, including stack trace.)

To Reproduce

python finetune_squad.py --optimizer adam --batch_size 12 --lr 3e-5 --epochs 2 --gpu

What have you tried to solve it?

None

Environment

$ pip list
Package     Version
----------- -------------------
certifi     2019.11.28
chardet     3.0.4
Cython      0.29.16
gluonnlp    0.9.1
graphviz    0.8.4
idna        2.8
mxnet-cu100 1.6.0b20200302
numpy       1.18.1
packaging   20.3
pip         19.3.1
pyparsing   2.4.7
requests    2.22.0
setuptools  44.0.0.post20200106
six         1.14.0
urllib3     1.25.7
wheel       0.33.6