The size of tensor a (8) must match the size of tensor b (2) at non-singleton dimension 1

GanjinZero / RRHF

[NIPS2023] RRHF & Wombat

780 stars 49 forks source link

The size of tensor a (8) must match the size of tensor b (2) at non-singleton dimension 1 #36

Closed ZJXNEFU closed 11 months ago

ZJXNEFU commented 11 months ago

per_device_batch_size>1时遇到了rw_diff = rw_scores.unsqueeze(0) - rw_scores.unsqueeze(-1)语句报错

The size of tensor a (8) must match the size of tensor b (2) at non-singleton dimension 1

GanjinZero commented 11 months ago

没有实现per_device_batch_size>1，自己改一下loss fn

ZJXNEFU commented 11 months ago

如果默认让response中的第0条比第1条好，让第1条向第0条学习，改成batch_size>1的情况如下代码是不是对的呢


    def rrhf_loss(self, scores, idxs, rw_scores):
        diff = scores[:-1:2] - scores[1::2]
        return -diff.sum()

    def sft_loss(self, logit_label, idxs, rw_scores):
        goal_data = logit_label[:-1:2]
        return -torch.mean(goal_data, dim=1).sum()

ZJXNEFU commented 11 months ago

这样改会出现loss为负的情况，但又不知道哪里改错了，下图是加了deepspeed后训练的情况

loss

GanjinZero commented 11 months ago

第一条比第零条分低才优化，否则不优化；diff和0需要取个max

ZJXNEFU commented 11 months ago

是的，我是保证了score中的第0条比第1条分值一定高，但是还是出现了loss为负的问题，是不是不应该是-diff.sum()，batch之间应该是-diff.mean()?

GanjinZero commented 11 months ago

你保证的是rw_scores的第0条比第1条分值一定高；score是模型算的，你怎么能保证呢

ZJXNEFU commented 11 months ago

score是离线的，数据准备阶段就排了个序

GanjinZero commented 11 months ago

你需要aval；aval = diff < 0 return -diff[aval].sum()

ZJXNEFU commented 11 months ago

这在一个batch内是需要sum还是平均呢，我目前是对一个batch内的做了平均


    def rrhf_loss(self, scores, idxs, rw_scores):
        diff = scores[:-1:2] - scores[1::2]
        diff[diff > 0] = 0
        return -diff.mean()

    def sft_loss(self, logit_label, idxs, rw_scores):
        goal_data = logit_label[:-1:2]
        return -torch.mean(goal_data, dim=1).mean()

GanjinZero commented 11 months ago

batch之间mean吧

ZJXNEFU commented 11 months ago

ok，那目前的看来就可以了，非常感谢