可以直接用VisualGLM-6b进行reward model的训练吗？

iamsile commented 1 year ago

您好，想用VisualGLM-6b进行reward model的训练，目前输入数据是纯文本，自己照着deepspeed_chat改了一下，发现在计算时总出错，具体log如下： File "/opt/conda/envs/rlhf_tw_test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/opt/conda/envs/rlhf_tw_test/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x130344 and 4096x1)

Sleepychord commented 1 year ago

这报错看上去似乎只是代码问题，写错了有bug而已。

iamsile commented 1 year ago

不好意思，是输入的时候大小没对上，已经解决了

THUDM / VisualGLM-6B

可以直接用VisualGLM-6b进行reward model的训练吗？ #26