2024-11-15 10:56:12,203 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 2
2024-11-15 10:56:12,203 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 7
2024-11-15 10:56:12,203 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 5
2024-11-15 10:56:12,203 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 1
2024-11-15 10:56:12,203 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 3
2024-11-15 10:56:12,203 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 4
2024-11-15 10:56:12,204 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 6
2024-11-15 10:56:12,204 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 0
2024-11-15 10:56:35,661 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 4
2024-11-15 10:56:35,661 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 2
2024-11-15 10:56:35,661 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 5
2024-11-15 10:56:35,661 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 3
2024-11-15 10:56:35,661 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 6
2024-11-15 10:56:35,662 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 1
2024-11-15 10:56:35,662 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 7
2024-11-15 10:56:35,663 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 0
2024-11-15 10:56:59,788 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 1
2024-11-15 10:56:59,788 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 5
2024-11-15 10:56:59,788 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 2
2024-11-15 10:56:59,788 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 4
2024-11-15 10:56:59,788 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 3
2024-11-15 10:56:59,788 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 6
2024-11-15 10:56:59,788 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 7
2024-11-15 10:56:59,790 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 0
2024-11-15 10:57:23,045 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 2
2024-11-15 10:57:23,045 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 4
2024-11-15 10:57:23,045 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 7
2024-11-15 10:57:23,045 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 1
2024-11-15 10:57:23,045 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 5
2024-11-15 10:57:23,045 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 3
2024-11-15 10:57:23,046 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 6
2024-11-15 10:57:23,048 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 0
2024-11-15 10:57:48,112 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 7
2024-11-15 10:57:48,112 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 5
2024-11-15 10:57:48,112 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 4
2024-11-15 10:57:48,112 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 3
2024-11-15 10:57:48,112 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 1
2024-11-15 10:57:48,112 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 6
2024-11-15 10:57:48,113 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 2
2024-11-15 10:57:48,114 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 0
2024-11-15 10:58:11,407 DEBUG TRAIN Batch 0/17000 loss nan acc 0.000000 lr 0.00085010 grad_norm nan rank 7
2024-11-15 10:58:11,407 DEBUG TRAIN Batch 0/17000 loss nan acc 0.000000 lr 0.00085010 grad_norm nan rank 5
2024-11-15 10:58:11,407 DEBUG TRAIN Batch 0/17000 loss nan acc 0.000000 lr 0.00085010 grad_norm nan rank 4
2024-11-15 10:58:11,407 DEBUG TRAIN Batch 0/17000 loss nan acc 0.000000 lr 0.00085010 grad_norm nan rank 6
2024-11-15 10:58:11,407 DEBUG TRAIN Batch 0/17000 loss nan acc 0.000000 lr 0.00085010 grad_norm nan rank
从头训模型总是loss突然变成nan
2024-11-15 10:56:12,203 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 2 2024-11-15 10:56:12,203 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 7 2024-11-15 10:56:12,203 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 5 2024-11-15 10:56:12,203 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 1 2024-11-15 10:56:12,203 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 3 2024-11-15 10:56:12,203 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 4 2024-11-15 10:56:12,204 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 6 2024-11-15 10:56:12,204 DEBUG TRAIN Batch 0/16500 loss 1.591806 acc 0.290056 lr 0.00082510 grad_norm 0.199882 rank 0 2024-11-15 10:56:35,661 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 4 2024-11-15 10:56:35,661 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 2 2024-11-15 10:56:35,661 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 5 2024-11-15 10:56:35,661 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 3 2024-11-15 10:56:35,661 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 6 2024-11-15 10:56:35,662 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 1 2024-11-15 10:56:35,662 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 7 2024-11-15 10:56:35,663 DEBUG TRAIN Batch 0/16600 loss nan acc 0.259458 lr 0.00083010 grad_norm nan rank 0 2024-11-15 10:56:59,788 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 1 2024-11-15 10:56:59,788 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 5 2024-11-15 10:56:59,788 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 2 2024-11-15 10:56:59,788 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 4 2024-11-15 10:56:59,788 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 3 2024-11-15 10:56:59,788 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 6 2024-11-15 10:56:59,788 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 7 2024-11-15 10:56:59,790 DEBUG TRAIN Batch 0/16700 loss 1.692054 acc 0.256676 lr 0.00083510 grad_norm nan rank 0 2024-11-15 10:57:23,045 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 2 2024-11-15 10:57:23,045 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 4 2024-11-15 10:57:23,045 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 7 2024-11-15 10:57:23,045 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 1 2024-11-15 10:57:23,045 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 5 2024-11-15 10:57:23,045 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 3 2024-11-15 10:57:23,046 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 6 2024-11-15 10:57:23,048 DEBUG TRAIN Batch 0/16800 loss 1.640502 acc 0.279475 lr 0.00084010 grad_norm nan rank 0 2024-11-15 10:57:48,112 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 7 2024-11-15 10:57:48,112 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 5 2024-11-15 10:57:48,112 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 4 2024-11-15 10:57:48,112 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 3 2024-11-15 10:57:48,112 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 1 2024-11-15 10:57:48,112 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 6 2024-11-15 10:57:48,113 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 2 2024-11-15 10:57:48,114 DEBUG TRAIN Batch 0/16900 loss nan acc 0.251761 lr 0.00084510 grad_norm nan rank 0 2024-11-15 10:58:11,407 DEBUG TRAIN Batch 0/17000 loss nan acc 0.000000 lr 0.00085010 grad_norm nan rank 7 2024-11-15 10:58:11,407 DEBUG TRAIN Batch 0/17000 loss nan acc 0.000000 lr 0.00085010 grad_norm nan rank 5 2024-11-15 10:58:11,407 DEBUG TRAIN Batch 0/17000 loss nan acc 0.000000 lr 0.00085010 grad_norm nan rank 4 2024-11-15 10:58:11,407 DEBUG TRAIN Batch 0/17000 loss nan acc 0.000000 lr 0.00085010 grad_norm nan rank 6 2024-11-15 10:58:11,407 DEBUG TRAIN Batch 0/17000 loss nan acc 0.000000 lr 0.00085010 grad_norm nan rank