GanjinZero / RRHF

[NIPS2023] RRHF & Wombat
780 stars 49 forks source link

The size of tensor a (8) must match the size of tensor b (2) at non-singleton dimension 1 #36

Closed ZJXNEFU closed 11 months ago

ZJXNEFU commented 11 months ago

per_device_batch_size>1时遇到了rw_diff = rw_scores.unsqueeze(0) - rw_scores.unsqueeze(-1)语句报错

The size of tensor a (8) must match the size of tensor b (2) at non-singleton dimension 1

GanjinZero commented 11 months ago

没有实现per_device_batch_size>1,自己改一下loss fn

ZJXNEFU commented 11 months ago

如果默认让response中的第0条比第1条好,让第1条向第0条学习,改成batch_size>1的情况如下代码是不是对的呢


    def rrhf_loss(self, scores, idxs, rw_scores):
        diff = scores[:-1:2] - scores[1::2]
        return -diff.sum()

    def sft_loss(self, logit_label, idxs, rw_scores):
        goal_data = logit_label[:-1:2]
        return -torch.mean(goal_data, dim=1).sum()
ZJXNEFU commented 11 months ago

这样改会出现loss为负的情况,但又不知道哪里改错了,下图是加了deepspeed后训练的情况

loss

GanjinZero commented 11 months ago

第一条比第零条分低才优化,否则不优化;diff和0需要取个max

ZJXNEFU commented 11 months ago

是的,我是保证了score中的第0条比第1条分值一定高,但是还是出现了loss为负的问题,是不是不应该是-diff.sum(),batch之间应该是-diff.mean()?

GanjinZero commented 11 months ago

你保证的是rw_scores的第0条比第1条分值一定高;score是模型算的,你怎么能保证呢

ZJXNEFU commented 11 months ago

score是离线的,数据准备阶段就排了个序

GanjinZero commented 11 months ago

你需要aval;aval = diff < 0 return -diff[aval].sum()

ZJXNEFU commented 11 months ago

这在一个batch内是需要sum还是平均呢,我目前是对一个batch内的做了平均


    def rrhf_loss(self, scores, idxs, rw_scores):
        diff = scores[:-1:2] - scores[1::2]
        diff[diff > 0] = 0
        return -diff.mean()

    def sft_loss(self, logit_label, idxs, rw_scores):
        goal_data = logit_label[:-1:2]
        return -torch.mean(goal_data, dim=1).mean()
GanjinZero commented 11 months ago

batch之间mean吧

ZJXNEFU commented 11 months ago

ok,那目前的看来就可以了,非常感谢