google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Apache License 2.0
2.31k stars 351 forks source link

Finetuning loss doesn't converge when using loading weights #85

Closed smeaktrobush closed 3 years ago

smeaktrobush commented 3 years ago

Hi. I am trying to finetune my pretrained model on a custom classification dataset. I tried running it on just 30 examples for 500 epochs. However, I noticed that the loss doesn't converge at all. But when I set init_checkpoint(this line) to None , the loss did converge. I don't know if I missed anything. Thanks in advance.

With init_checkpoint set to my pretrained directory

10/500 = 2.0%, SPS: 0.6, ELAP: 18, ETA: 14:36 - loss: 34.0213
20/500 = 4.0%, SPS: 1.0, ELAP: 21, ETA: 8:25 - loss: 22.6425
30/500 = 6.0%, SPS: 1.2, ELAP: 24, ETA: 6:19 - loss: 25.9802
40/500 = 8.0%, SPS: 1.5, ELAP: 27, ETA: 5:15 - loss: 18.7123
50/500 = 10.0%, SPS: 1.6, ELAP: 31, ETA: 4:35 - loss: 14.2728
60/500 = 12.0%, SPS: 1.8, ELAP: 34, ETA: 4:07 - loss: 22.1738
70/500 = 14.0%, SPS: 1.9, ELAP: 37, ETA: 3:47 - loss: 24.7776
80/500 = 16.0%, SPS: 2.0, ELAP: 40, ETA: 3:30 - loss: 18.2702
90/500 = 18.0%, SPS: 2.1, ELAP: 43, ETA: 3:17 - loss: 15.5855
100/500 = 20.0%, SPS: 2.2, ELAP: 46, ETA: 3:05 - loss: 21.6619
110/500 = 22.0%, SPS: 2.2, ELAP: 50, ETA: 2:56 - loss: 25.0141
120/500 = 24.0%, SPS: 2.3, ELAP: 53, ETA: 2:47 - loss: 24.5834
130/500 = 26.0%, SPS: 2.3, ELAP: 56, ETA: 2:39 - loss: 25.9109
140/500 = 28.0%, SPS: 2.4, ELAP: 59, ETA: 2:32 - loss: 22.3996
150/500 = 30.0%, SPS: 2.4, ELAP: 1:02, ETA: 2:25 - loss: 21.3710
160/500 = 32.0%, SPS: 2.4, ELAP: 1:05, ETA: 2:19 - loss: 19.6154
170/500 = 34.0%, SPS: 2.5, ELAP: 1:09, ETA: 2:13 - loss: 24.4710
180/500 = 36.0%, SPS: 2.5, ELAP: 1:12, ETA: 2:07 - loss: 24.8363
190/500 = 38.0%, SPS: 2.5, ELAP: 1:15, ETA: 2:02 - loss: 22.9035
200/500 = 40.0%, SPS: 2.6, ELAP: 1:18, ETA: 1:57 - loss: 23.0937
210/500 = 42.0%, SPS: 2.6, ELAP: 1:21, ETA: 1:52 - loss: 27.5209
220/500 = 44.0%, SPS: 2.6, ELAP: 1:24, ETA: 1:47 - loss: 24.2148
230/500 = 46.0%, SPS: 2.6, ELAP: 1:28, ETA: 1:43 - loss: 22.3650
240/500 = 48.0%, SPS: 2.6, ELAP: 1:31, ETA: 1:38 - loss: 18.7241
250/500 = 50.0%, SPS: 2.7, ELAP: 1:34, ETA: 1:34 - loss: 13.2163
260/500 = 52.0%, SPS: 2.7, ELAP: 1:37, ETA: 1:30 - loss: 20.2439
270/500 = 54.0%, SPS: 2.7, ELAP: 1:40, ETA: 1:25 - loss: 16.1609
280/500 = 56.0%, SPS: 2.7, ELAP: 1:43, ETA: 1:21 - loss: 22.1817
290/500 = 58.0%, SPS: 2.7, ELAP: 1:47, ETA: 1:17 - loss: 22.2718
300/500 = 60.0%, SPS: 2.7, ELAP: 1:50, ETA: 1:13 - loss: 18.9889
310/500 = 62.0%, SPS: 2.7, ELAP: 1:53, ETA: 1:09 - loss: 24.8572
320/500 = 64.0%, SPS: 2.8, ELAP: 1:56, ETA: 1:05 - loss: 24.9129
330/500 = 66.0%, SPS: 2.8, ELAP: 1:59, ETA: 1:01 - loss: 20.2005
340/500 = 68.0%, SPS: 2.8, ELAP: 2:02, ETA: 58 - loss: 20.3365
350/500 = 70.0%, SPS: 2.8, ELAP: 2:06, ETA: 54 - loss: 22.9624
360/500 = 72.0%, SPS: 2.8, ELAP: 2:09, ETA: 50 - loss: 22.4254
370/500 = 74.0%, SPS: 2.8, ELAP: 2:12, ETA: 46 - loss: 24.3129
380/500 = 76.0%, SPS: 2.8, ELAP: 2:15, ETA: 43 - loss: 24.6028
390/500 = 78.0%, SPS: 2.8, ELAP: 2:18, ETA: 39 - loss: 19.6961
400/500 = 80.0%, SPS: 2.8, ELAP: 2:21, ETA: 35 - loss: 26.1590
410/500 = 82.0%, SPS: 2.8, ELAP: 2:25, ETA: 32 - loss: 18.7580
420/500 = 84.0%, SPS: 2.8, ELAP: 2:28, ETA: 28 - loss: 22.4846
430/500 = 86.0%, SPS: 2.8, ELAP: 2:31, ETA: 25 - loss: 13.3537
440/500 = 88.0%, SPS: 2.9, ELAP: 2:34, ETA: 21 - loss: 21.4833
450/500 = 90.0%, SPS: 2.9, ELAP: 2:37, ETA: 17 - loss: 23.5883
460/500 = 92.0%, SPS: 2.9, ELAP: 2:40, ETA: 14 - loss: 22.4382
470/500 = 94.0%, SPS: 2.9, ELAP: 2:44, ETA: 10 - loss: 21.6351
480/500 = 96.0%, SPS: 2.9, ELAP: 2:47, ETA: 7 - loss: 23.9042
490/500 = 98.0%, SPS: 2.9, ELAP: 2:50, ETA: 3 - loss: 22.2937
500/500 = 100.0%, SPS: 2.9, ELAP: 2:53, ETA: 0 - loss: 23.9965

With init_checkpoint set to None

10/500 = 2.0%, SPS: 0.6, ELAP: 18, ETA: 14:24 - loss: 44.5637
20/500 = 4.0%, SPS: 1.0, ELAP: 21, ETA: 8:19 - loss: 24.1437
30/500 = 6.0%, SPS: 1.3, ELAP: 24, ETA: 6:15 - loss: 21.3811
40/500 = 8.0%, SPS: 1.5, ELAP: 27, ETA: 5:12 - loss: 32.0097
50/500 = 10.0%, SPS: 1.7, ELAP: 30, ETA: 4:32 - loss: 40.4746
60/500 = 12.0%, SPS: 1.8, ELAP: 33, ETA: 4:05 - loss: 22.8530
70/500 = 14.0%, SPS: 1.9, ELAP: 37, ETA: 3:45 - loss: 14.1487
80/500 = 16.0%, SPS: 2.0, ELAP: 40, ETA: 3:29 - loss: 22.7280
90/500 = 18.0%, SPS: 2.1, ELAP: 43, ETA: 3:15 - loss: 16.3314
100/500 = 20.0%, SPS: 2.2, ELAP: 46, ETA: 3:04 - loss: 8.5457
110/500 = 22.0%, SPS: 2.2, ELAP: 49, ETA: 2:54 - loss: 2.4439
120/500 = 24.0%, SPS: 2.3, ELAP: 52, ETA: 2:46 - loss: 3.1958
130/500 = 26.0%, SPS: 2.3, ELAP: 56, ETA: 2:38 - loss: 0.4987
140/500 = 28.0%, SPS: 2.4, ELAP: 59, ETA: 2:31 - loss: 0.3138
150/500 = 30.0%, SPS: 2.4, ELAP: 1:02, ETA: 2:24 - loss: 0.3861
160/500 = 32.0%, SPS: 2.5, ELAP: 1:05, ETA: 2:18 - loss: 1.5898
170/500 = 34.0%, SPS: 2.5, ELAP: 1:08, ETA: 2:12 - loss: 42.5476
180/500 = 36.0%, SPS: 2.5, ELAP: 1:11, ETA: 2:07 - loss: 0.0254
190/500 = 38.0%, SPS: 2.5, ELAP: 1:15, ETA: 2:02 - loss: 0.0023
200/500 = 40.0%, SPS: 2.6, ELAP: 1:18, ETA: 1:57 - loss: 1.3263
210/500 = 42.0%, SPS: 2.6, ELAP: 1:21, ETA: 1:52 - loss: 0.0001
220/500 = 44.0%, SPS: 2.6, ELAP: 1:24, ETA: 1:47 - loss: 0.0001
230/500 = 46.0%, SPS: 2.6, ELAP: 1:27, ETA: 1:42 - loss: 0.0000
240/500 = 48.0%, SPS: 2.7, ELAP: 1:30, ETA: 1:38 - loss: 0.0001
250/500 = 50.0%, SPS: 2.7, ELAP: 1:34, ETA: 1:34 - loss: 0.0000
260/500 = 52.0%, SPS: 2.7, ELAP: 1:37, ETA: 1:29 - loss: 0.0000
270/500 = 54.0%, SPS: 2.7, ELAP: 1:40, ETA: 1:25 - loss: 0.0000
280/500 = 56.0%, SPS: 2.7, ELAP: 1:43, ETA: 1:21 - loss: 0.0000
290/500 = 58.0%, SPS: 2.7, ELAP: 1:46, ETA: 1:17 - loss: 0.0000
300/500 = 60.0%, SPS: 2.7, ELAP: 1:49, ETA: 1:13 - loss: 0.0000
310/500 = 62.0%, SPS: 2.8, ELAP: 1:53, ETA: 1:09 - loss: 0.0000
320/500 = 64.0%, SPS: 2.8, ELAP: 1:56, ETA: 1:05 - loss: 0.0000
330/500 = 66.0%, SPS: 2.8, ELAP: 1:59, ETA: 1:01 - loss: 0.0001
340/500 = 68.0%, SPS: 2.8, ELAP: 2:02, ETA: 57 - loss: 0.0000
350/500 = 70.0%, SPS: 2.8, ELAP: 2:05, ETA: 54 - loss: 0.0000
360/500 = 72.0%, SPS: 2.8, ELAP: 2:08, ETA: 50 - loss: 0.0000 370/500 = 74.0%, SPS: 2.8, ELAP: 2:11, ETA: 46 - loss: 0.0000 380/500 = 76.0%, SPS: 2.8, ELAP: 2:15, ETA: 43 - loss: 0.0000 390/500 = 78.0%, SPS: 2.8, ELAP: 2:18, ETA: 39 - loss: 0.0000 400/500 = 80.0%, SPS: 2.8, ELAP: 2:21, ETA: 35 - loss: 0.0000 410/500 = 82.0%, SPS: 2.8, ELAP: 2:24, ETA: 32 - loss: 0.0000 420/500 = 84.0%, SPS: 2.9, ELAP: 2:27, ETA: 28 - loss: 0.0000 430/500 = 86.0%, SPS: 2.9, ELAP: 2:30, ETA: 24 - loss: 0.0000 440/500 = 88.0%, SPS: 2.9, ELAP: 2:34, ETA: 21 - loss: 0.0000 450/500 = 90.0%, SPS: 2.9, ELAP: 2:37, ETA: 17 - loss: 0.0000 460/500 = 92.0%, SPS: 2.9, ELAP: 2:40, ETA: 14 - loss: 0.0000 470/500 = 94.0%, SPS: 2.9, ELAP: 2:43, ETA: 10 - loss: 0.0000 480/500 = 96.0%, SPS: 2.9, ELAP: 2:46, ETA: 7 - loss: 0.0000 490/500 = 98.0%, SPS: 2.9, ELAP: 2:49, ETA: 3 - loss: 0.0000 500/500 = 100.0%, SPS: 2.9, ELAP: 2:53, ETA: 0 - loss: 0.0000 500/500 = 100.0%, SPS: 2.8, ELAP: 3:02, ETA: 0