THUDM / GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Apache License 2.0
3.28k stars 235 forks source link

微调Loss为0 #261

Closed Richard12868 closed 2 days ago

Richard12868 commented 2 days ago

System Info / 系統信息

配置信息完全默认,数据也是原始的,loss一直为0 {"messages": [{"role": "user", "content": "类型#裤材质#牛仔布风格#性感"}, {"role": "assistant", "content": "3x1的这款牛仔裤采用浅白的牛仔面料为裤身材质,其柔然的手感和细腻的质地,在穿着舒适的同时,透露着清纯甜美的个性气质。除此之外,流畅的裤身剪裁将性感的腿部曲线彰显的淋漓尽致,不失为一款随性出街的必备单品。"}]}

Who can help? / 谁可以帮助到您?

@zRzRzRzRzRzRzR

Information / 问题信息

Reproduction / 复现过程

这是记录 Running training Num examples = 1,010 Num Epochs = 24 Instantaneous batch size per device = 1 Total train batch size (w. parallel, distributed & accumulation) = 8 Gradient Accumulation steps = 1 Total optimization steps = 3,000 Number of trainable parameters = 2,785,280 {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0004996666666666667, 'epoch': 0.02} {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0004993333333333334, 'epoch': 0.03} {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.000499, 'epoch': 0.05} {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0004986666666666667, 'epoch': 0.06} {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0004983333333333334, 'epoch': 0.08} {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.000498, 'epoch': 0.09} {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0004976666666666667, 'epoch': 0.11} {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.0004973333333333334, 'epoch': 0.13} {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.000497, 'epoch': 0.14}

在代码训练部分加了loss的日志,发现输出为nan

class Seq2SeqTrainer: def training_step(self, model, inputs): loss = self.compute_loss(model, inputs) logging.basicConfig(filename='training_loss.log', level=logging.INFO) logging.info(f'Training loss: {loss.item()}')

Expected behavior / 期待表现

loss有值

BigCakeLove commented 2 days ago

请问这个是怎么解决的?

liuhe6 commented 21 hours ago

同样问题

zhouzhoumd commented 9 hours ago

同样问题,请问解决了吗?

liuhe6 commented 4 hours ago

@zRzRzRzRzRzRzR 麻烦看一下?

zRzRzRzRzRzRzR commented 3 hours ago

label被屏蔽了,你设置最长的长度长一点,默认的配置就足够长,能读取到全部的数据而不是截断