OFA-Sys / Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
MIT License
4.21k stars 439 forks source link

使用flash-Attention训练,loss不下降,acc也不增加 #272

Open gobigrassland opened 5 months ago

gobigrassland commented 5 months ago

我下载了WuKong数据集,有效图片约700万条,然后基于此训练vit-base-16这个模型,使用常规transformer,按照论文stage1-stage2步骤训练,都是正常的。但是使用flash-attention却出现loss不下降,acc也不增加问题。

………………

2024-03-08,10:53:46 | INFO | Rank 0 | Global Steps: 3320/166810 | Train Epoch: 1 [13598720/68325376 (20%)] | Loss: 3.384517 | Image2Text Acc: 33.45 | Text2Image Acc: 32.59 | Data Time: 0.095s | Batch Time: 1.865s | LR: 0.000050 | logit_scale: 4.579 | Global Batch Size: 4096 2024-03-08,10:54:00 | INFO | Rank 0 | Global Steps: 3330/166810 | Train Epoch: 1 [13639680/68325376 (20%)] | Loss: 3.429487 | Image2Text Acc: 34.08 | Text2Image Acc: 32.52 | Data Time: 0.138s | Batch Time: 0.638s | LR: 0.000050 | logit_scale: 4.579 | Global Batch Size: 4096 2024-03-08,10:54:13 | INFO | Rank 0 | Global Steps: 3340/166810 | Train Epoch: 1 [13680640/68325376 (20%)] | Loss: 3.391076 | Image2Text Acc: 33.30 | Text2Image Acc: 33.64 | Data Time: 0.111s | Batch Time: 0.629s | LR: 0.000050 | logit_scale: 4.579 | Global Batch Size: 4096 2024-03-08,10:54:25 | INFO | Rank 0 | Global Steps: 3350/166810 | Train Epoch: 1 [13721600/68325376 (20%)] | Loss: 3.324344 | Image2Text Acc: 34.67 | Text2Image Acc: 33.86 | Data Time: 0.102s | Batch Time: 1.083s | LR: 0.000050 | logit_scale: 4.579 | Global Batch Size: 4096 2024-03-08,10:54:40 | INFO | Rank 0 | Global Steps: 3360/166810 | Train Epoch: 1 [13762560/68325376 (20%)] | Loss: 3.322722 | Image2Text Acc: 35.72 | Text2Image Acc: 34.84 | Data Time: 0.101s | Batch Time: 0.623s | LR: 0.000050 | logit_scale: 4.579 | Global Batch Size: 4096 2024-03-08,10:54:51 | INFO | Rank 0 | Global Steps: 3370/166810 | Train Epoch: 1 [13803520/68325376 (20%)] | Loss: 3.371596 | Image2Text Acc: 34.77 | Text2Image Acc: 32.64 | Data Time: 0.110s | Batch Time: 0.627s | LR: 0.000050 | logit_scale: 4.579 | Global Batch Size: 4096 2024-03-08,10:55:07 | INFO | Rank 0 | Global Steps: 3380/166810 | Train Epoch: 1 [13844480/68325376 (20%)] | Loss: 3.418872 | Image2Text Acc: 33.30 | Text2Image Acc: 33.15 | Data Time: 0.107s | Batch Time: 0.624s | LR: 0.000050 | logit_scale: 4.578 | Global Batch Size: 4096 2024-03-08,10:55:18 | INFO | Rank 0 | Global Steps: 3390/166810 | Train Epoch: 1 [13885440/68325376 (20%)] | Loss: 3.434360 | Image2Text Acc: 34.57 | Text2Image Acc: 33.45 | Data Time: 0.113s | Batch Time: 0.633s | LR: 0.000050 | logit_scale: 4.578 | Global Batch Size: 4096 2024-03-08,10:55:30 | INFO | Rank 0 | Global Steps: 3400/166810 | Train Epoch: 1 [13926400/68325376 (20%)] | Loss: 3.394838 | Image2Text Acc: 34.57 | Text2Image Acc: 34.16 | Data Time: 0.120s | Batch Time: 0.639s | LR: 0.000050 | logit_scale: 4.578 | Global Batch Size: 4096 2024-03-08,10:55:42 | INFO | Rank 0 | Global Steps: 3410/166810 | Train Epoch: 1 [13967360/68325376 (20%)] | Loss: 3.375341 | Image2Text Acc: 34.18 | Text2Image Acc: 33.81 | Data Time: 0.124s | Batch Time: 0.639s | LR: 0.000050 | logit_scale: 4.578 | Global Batch Size: 4096 2024-03-08,10:55:55 | INFO | Rank 0 | Global Steps: 3420/166810 | Train Epoch: 1 [14008320/68325376 (21%)] | Loss: 3.352150 | Image2Text Acc: 33.59 | Text2Image Acc: 33.47 | Data Time: 0.112s | Batch Time: 1.002s | LR: 0.000050 | logit_scale: 4.578 | Global Batch Size: 4096 2024-03-08,10:56:07 | INFO | Rank 0 | Global Steps: 3430/166810 | Train Epoch: 1 [14049280/68325376 (21%)] | Loss: 3.330369 | Image2Text Acc: 34.74 | Text2Image Acc: 33.76 | Data Time: 0.112s | Batch Time: 1.356s | LR: 0.000050 | logit_scale: 4.578 | Global Batch Size: 4096 2024-03-08,10:56:20 | INFO | Rank 0 | Global Steps: 3440/166810 | Train Epoch: 1 [14090240/68325376 (21%)] | Loss: 3.338193 | Image2Text Acc: 34.42 | Text2Image Acc: 33.67 | Data Time: 0.112s | Batch Time: 1.505s | LR: 0.000050 | logit_scale: 4.578 | Global Batch Size: 4096 2024-03-08,10:56:32 | INFO | Rank 0 | Global Steps: 3450/166810 | Train Epoch: 1 [14131200/68325376 (21%)] | Loss: 3.298738 | Image2Text Acc: 35.38 | Text2Image Acc: 34.59 | Data Time: 0.103s | Batch Time: 1.825s | LR: 0.000050 | logit_scale: 4.578 | Global Batch Size: 4096


- 使用flash-attention后,相比上述任务,训练脚本仅仅是添加了 "use-flash-attention"命令

2024-03-11,03:54:18 | INFO | Rank 0 | Global Steps: 10/166810 | Train Epoch: 1 [40960/68325376 (0%)] | Loss: 8.335938 | Image2Text Acc: 0.00 | Text2Image Acc: 0.02 | Data Time: 0.148s | Batch Time: 0.563s | LR: 0.000005 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:54:31 | INFO | Rank 0 | Global Steps: 20/166810 | Train Epoch: 1 [81920/68325376 (0%)] | Loss: 8.328125 | Image2Text Acc: 0.02 | Text2Image Acc: 0.05 | Data Time: 0.148s | Batch Time: 0.559s | LR: 0.000010 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:54:45 | INFO | Rank 0 | Global Steps: 30/166810 | Train Epoch: 1 [122880/68325376 (0%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.147s | Batch Time: 0.561s | LR: 0.000015 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:54:59 | INFO | Rank 0 | Global Steps: 40/166810 | Train Epoch: 1 [163840/68325376 (0%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.05 | Data Time: 0.149s | Batch Time: 0.561s | LR: 0.000020 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:55:12 | INFO | Rank 0 | Global Steps: 50/166810 | Train Epoch: 1 [204800/68325376 (0%)] | Loss: 8.320312 | Image2Text Acc: 0.00 | Text2Image Acc: 0.02 | Data Time: 0.182s | Batch Time: 0.602s | LR: 0.000025 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:55:26 | INFO | Rank 0 | Global Steps: 60/166810 | Train Epoch: 1 [245760/68325376 (0%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.182s | Batch Time: 0.601s | LR: 0.000030 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:55:40 | INFO | Rank 0 | Global Steps: 70/166810 | Train Epoch: 1 [286720/68325376 (0%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.187s | Batch Time: 0.605s | LR: 0.000035 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:55:53 | INFO | Rank 0 | Global Steps: 80/166810 | Train Epoch: 1 [327680/68325376 (0%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.188s | Batch Time: 0.615s | LR: 0.000040 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:56:07 | INFO | Rank 0 | Global Steps: 90/166810 | Train Epoch: 1 [368640/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.07 | Data Time: 0.191s | Batch Time: 0.615s | LR: 0.000045 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:56:29 | INFO | Rank 0 | Global Steps: 100/166810 | Train Epoch: 1 [409600/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.183s | Batch Time: 0.611s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:56:43 | INFO | Rank 0 | Global Steps: 110/166810 | Train Epoch: 1 [450560/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.167s | Batch Time: 0.911s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:56:57 | INFO | Rank 0 | Global Steps: 120/166810 | Train Epoch: 1 [491520/68325376 (1%)] | Loss: 8.312500 | Image2Text Acc: 0.02 | Text2Image Acc: 0.05 | Data Time: 0.174s | Batch Time: 0.599s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:57:11 | INFO | Rank 0 | Global Steps: 130/166810 | Train Epoch: 1 [532480/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.170s | Batch Time: 0.599s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:57:25 | INFO | Rank 0 | Global Steps: 140/166810 | Train Epoch: 1 [573440/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.05 | Text2Image Acc: 0.02 | Data Time: 0.179s | Batch Time: 0.600s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:57:38 | INFO | Rank 0 | Global Steps: 150/166810 | Train Epoch: 1 [614400/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.00 | Text2Image Acc: 0.02 | Data Time: 0.170s | Batch Time: 0.601s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:57:52 | INFO | Rank 0 | Global Steps: 160/166810 | Train Epoch: 1 [655360/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.05 | Data Time: 0.171s | Batch Time: 0.600s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:58:06 | INFO | Rank 0 | Global Steps: 170/166810 | Train Epoch: 1 [696320/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.00 | Text2Image Acc: 0.05 | Data Time: 0.171s | Batch Time: 0.600s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:58:20 | INFO | Rank 0 | Global Steps: 180/166810 | Train Epoch: 1 [737280/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.05 | Text2Image Acc: 0.05 | Data Time: 0.168s | Batch Time: 0.599s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:58:34 | INFO | Rank 0 | Global Steps: 190/166810 | Train Epoch: 1 [778240/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.00 | Text2Image Acc: 0.02 | Data Time: 0.186s | Batch Time: 0.614s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:58:48 | INFO | Rank 0 | Global Steps: 200/166810 | Train Epoch: 1 [819200/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.05 | Text2Image Acc: 0.02 | Data Time: 0.167s | Batch Time: 0.601s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:59:02 | INFO | Rank 0 | Global Steps: 210/166810 | Train Epoch: 1 [860160/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.00 | Data Time: 0.166s | Batch Time: 0.600s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:59:16 | INFO | Rank 0 | Global Steps: 220/166810 | Train Epoch: 1 [901120/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.07 | Text2Image Acc: 0.00 | Data Time: 0.168s | Batch Time: 0.600s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:59:29 | INFO | Rank 0 | Global Steps: 230/166810 | Train Epoch: 1 [942080/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.00 | Data Time: 0.170s | Batch Time: 0.601s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:59:43 | INFO | Rank 0 | Global Steps: 240/166810 | Train Epoch: 1 [983040/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.170s | Batch Time: 0.599s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,03:59:57 | INFO | Rank 0 | Global Steps: 250/166810 | Train Epoch: 1 [1024000/68325376 (1%)] | Loss: 8.320312 | Image2Text Acc: 0.07 | Text2Image Acc: 0.05 | Data Time: 0.189s | Batch Time: 0.617s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,04:00:11 | INFO | Rank 0 | Global Steps: 260/166810 | Train Epoch: 1 [1064960/68325376 (2%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.00 | Data Time: 0.188s | Batch Time: 0.617s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,04:00:25 | INFO | Rank 0 | Global Steps: 270/166810 | Train Epoch: 1 [1105920/68325376 (2%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.00 | Data Time: 0.171s | Batch Time: 0.600s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,04:00:39 | INFO | Rank 0 | Global Steps: 280/166810 | Train Epoch: 1 [1146880/68325376 (2%)] | Loss: 8.320312 | Image2Text Acc: 0.00 | Text2Image Acc: 0.02 | Data Time: 0.187s | Batch Time: 0.618s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,04:00:53 | INFO | Rank 0 | Global Steps: 290/166810 | Train Epoch: 1 [1187840/68325376 (2%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.186s | Batch Time: 0.617s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,04:01:07 | INFO | Rank 0 | Global Steps: 300/166810 | Train Epoch: 1 [1228800/68325376 (2%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.00 | Data Time: 0.188s | Batch Time: 0.618s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,04:01:21 | INFO | Rank 0 | Global Steps: 310/166810 | Train Epoch: 1 [1269760/68325376 (2%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.187s | Batch Time: 0.619s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,04:01:35 | INFO | Rank 0 | Global Steps: 320/166810 | Train Epoch: 1 [1310720/68325376 (2%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.191s | Batch Time: 0.621s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,04:01:49 | INFO | Rank 0 | Global Steps: 330/166810 | Train Epoch: 1 [1351680/68325376 (2%)] | Loss: 8.320312 | Image2Text Acc: 0.05 | Text2Image Acc: 0.02 | Data Time: 0.188s | Batch Time: 0.619s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,04:02:03 | INFO | Rank 0 | Global Steps: 340/166810 | Train Epoch: 1 [1392640/68325376 (2%)] | Loss: 8.320312 | Image2Text Acc: 0.00 | Text2Image Acc: 0.00 | Data Time: 0.186s | Batch Time: 0.618s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,04:02:17 | INFO | Rank 0 | Global Steps: 350/166810 | Train Epoch: 1 [1433600/68325376 (2%)] | Loss: 8.320312 | Image2Text Acc: 0.05 | Text2Image Acc: 0.02 | Data Time: 0.186s | Batch Time: 0.617s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,04:02:31 | INFO | Rank 0 | Global Steps: 360/166810 | Train Epoch: 1 [1474560/68325376 (2%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.171s | Batch Time: 0.601s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,04:02:45 | INFO | Rank 0 | Global Steps: 370/166810 | Train Epoch: 1 [1515520/68325376 (2%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.05 | Data Time: 0.169s | Batch Time: 0.599s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,04:02:59 | INFO | Rank 0 | Global Steps: 380/166810 | Train Epoch: 1 [1556480/68325376 (2%)] | Loss: 8.320312 | Image2Text Acc: 0.00 | Text2Image Acc: 0.00 | Data Time: 0.171s | Batch Time: 0.600s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,04:03:17 | INFO | Rank 0 | Global Steps: 390/166810 | Train Epoch: 1 [1597440/68325376 (2%)] | Loss: 8.320312 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.157s | Batch Time: 2.055s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,04:03:29 | INFO | Rank 0 | Global Steps: 400/166810 | Train Epoch: 1 [1638400/68325376 (2%)] | Loss: 8.320312 | Image2Text Acc: 0.05 | Text2Image Acc: 0.00 | Data Time: 0.166s | Batch Time: 2.007s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096

……

2024-03-11,05:55:26 | INFO | Rank 0 | Global Steps: 5000/166810 | Train Epoch: 1 [20480000/68325376 (30%)] | Loss: 8.546875 | Image2Text Acc: 0.02 | Text2Image Acc: 0.00 | Data Time: 0.171s | Batch Time: 0.594s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,05:55:40 | INFO | Rank 0 | Global Steps: 5010/166810 | Train Epoch: 1 [20520960/68325376 (30%)] | Loss: 8.476562 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.163s | Batch Time: 2.278s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,05:55:54 | INFO | Rank 0 | Global Steps: 5020/166810 | Train Epoch: 1 [20561920/68325376 (30%)] | Loss: 8.468750 | Image2Text Acc: 0.00 | Text2Image Acc: 0.02 | Data Time: 0.160s | Batch Time: 2.334s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,05:56:11 | INFO | Rank 0 | Global Steps: 5030/166810 | Train Epoch: 1 [20602880/68325376 (30%)] | Loss: 8.390625 | Image2Text Acc: 0.05 | Text2Image Acc: 0.07 | Data Time: 0.147s | Batch Time: 2.128s | LR: 0.000050 | logit_scale: 4.605 | Global Batch Size: 4096 2024-03-11,05:56:27 | INFO | Rank 0 | Global Steps: 5040/166810 | Train Epoch: 1 [20643840/68325376 (30%)] | Loss: 8.523438 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.176s | Batch Time: 0.589s | LR: 0.000050 | logit_scale: 4.604 | Global Batch Size: 4096 2024-03-11,05:56:40 | INFO | Rank 0 | Global Steps: 5050/166810 | Train Epoch: 1 [20684800/68325376 (30%)] | Loss: 8.343750 | Image2Text Acc: 0.00 | Text2Image Acc: 0.00 | Data Time: 0.158s | Batch Time: 0.587s | LR: 0.000050 | logit_scale: 4.604 | Global Batch Size: 4096 2024-03-11,05:56:55 | INFO | Rank 0 | Global Steps: 5060/166810 | Train Epoch: 1 [20725760/68325376 (30%)] | Loss: 8.578125 | Image2Text Acc: 0.02 | Text2Image Acc: 0.00 | Data Time: 0.177s | Batch Time: 0.585s | LR: 0.000050 | logit_scale: 4.604 | Global Batch Size: 4096 2024-03-11,05:57:10 | INFO | Rank 0 | Global Steps: 5070/166810 | Train Epoch: 1 [20766720/68325376 (30%)] | Loss: 8.546875 | Image2Text Acc: 0.02 | Text2Image Acc: 0.02 | Data Time: 0.190s | Batch Time: 0.617s | LR: 0.000050 | logit_scale: 4.604 | Global Batch Size: 4096 2024-03-11,05:57:24 | INFO | Rank 0 | Global Steps: 5080/166810 | Train Epoch: 1 [20807680/68325376 (30%)] | Loss: 8.375000 | Image2Text Acc: 0.02 | Text2Image Acc: 0.00 | Data Time: 0.177s | Batch Time: 0.603s | LR: 0.000050 | logit_scale: 4.604 | Global Batch Size: 4096 2024-03-11,05:57:38 | INFO | Rank 0 | Global Steps: 5090/166810 | Train Epoch: 1 [20848640/68325376 (31%)] | Loss: 8.343750 | Image2Text Acc: 0.00 | Text2Image Acc: 0.00 | Data Time: 0.177s | Batch Time: 0.602s | LR: 0.000050 | logit_scale: 4.604 | Global Batch Size: 4096 2024-03-11,05:57:52 | INFO | Rank 0 | Global Steps: 5100/166810 | Train Epoch: 1 [20889600/68325376 (31%)] | Loss: 8.343750 | Image2Text Acc: 0.02 | Text2Image Acc: 0.07 | Data Time: 0.180s | Batch Time: 0.604s | LR: 0.000050 | logit_scale: 4.604 | Global Batch Size: 4096 2024-03-11,05:58:11 | INFO | Rank 0 | Global Steps: 5110/166810 | Train Epoch: 1 [20930560/68325376 (31%)] | Loss: 8.484375 | Image2Text Acc: 0.02 | Text2Image Acc: 0.00 | Data Time: 0.186s | Batch Time: 0.607s | LR: 0.000050 | logit_scale: 4.604 | Global Batch Size: 4096 2024-03-11,05:58:25 | INFO | Rank 0 | Global Steps: 5120/166810 | Train Epoch: 1 [20971520/68325376 (31%)] | Loss: 8.468750 | Image2Text Acc: 0.00 | Text2Image Acc: 0.00 | Data Time: 0.182s | Batch Time: 0.606s | LR: 0.000050 | logit_scale: 4.604 | Global Batch Size: 4096 2024-03-11,05:58:39 | INFO | Rank 0 | Global Steps: 5130/166810 | Train Epoch: 1 [21012480/68325376 (31%)] | Loss: 8.632812 | Image2Text Acc: 0.02 | Text2Image Acc: 0.05 | Data Time: 0.178s | Batch Time: 0.608s | LR: 0.000050 | logit_scale: 4.604 | Global Batch Size: 4096 2024-03-11,05:58:53 | INFO | Rank 0 | Global Steps: 5140/166810 | Train Epoch: 1 [21053440/68325376 (31%)] | Loss: 8.453125 | Image2Text Acc: 0.00 | Text2Image Acc: 0.00 | Data Time: 0.181s | Batch Time: 0.608s | LR: 0.000050 | logit_scale: 4.604 | Global Batch Size: 4096 2024-03-11,05:59:07 | INFO | Rank 0 | Global Steps: 5150/166810 | Train Epoch: 1 [21094400/68325376 (31%)] | Loss: 8.515625 | Image2Text Acc: 0.00 | Text2Image Acc: 0.02 | Data Time: 0.198s | Batch Time: 0.625s | LR: 0.000050 | logit_scale: 4.604 | Global Batch Size: 4096 2024-03-11,05:59:22 | INFO | Rank 0 | Global Steps: 5160/166810 | Train Epoch: 1 [21135360/68325376 (31%)] | Loss: 8.367188 | Image2Text Acc: 0.05 | Text2Image Acc: 0.02 | Data Time: 0.170s | Batch Time: 0.583s | LR: 0.000050 | logit_scale: 4.604 | Global Batch Size: 4096 2024-03-11,05:59:40 | INFO | Rank 0 | Global Steps: 5170/166810 | Train Epoch: 1 [21176320/68325376 (31%)] | Loss: 8.320312 | Image2Text Acc: 0.05 | Text2Image Acc: 0.05 | Data Time: 0.156s | Batch Time: 2.139s | LR: 0.000050 | logit_scale: 4.604 | Global Batch Size: 4096

keminze commented 4 months ago

模型无法拟合你的数据 1.数据集是否有问题 2.数据集太大,需要长时间训练 3.尝试调整学习率、优化器等参数

DtYXs commented 3 months ago

您好,请问能提供下您的pytorch和flash_attention版本吗。支持flash_attention时的官方版本是0.2.8,已经比较旧了,可能需要适配新版本。