InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
https://xtuner.readthedocs.io/zh-cn/latest/
Apache License 2.0
3.73k stars 302 forks source link

有个疑问,计算Loss的时候并不是以reward_token_id最终loss计算的,为什么推理的时候可以以reward_token_id为准呢? #921

Open woshixiaobai2019 opened 2 weeks ago

woshixiaobai2019 commented 2 weeks ago

https://github.com/InternLM/xtuner/blob/081c8ca874bdbf7a7f8cd0a9e4cba503eaaa7bba/xtuner/model/reward.py#L311

tcxia commented 2 weeks ago

您好,请问下您这边用什么推理的呢?

woshixiaobai2019 commented 2 weeks ago

抱歉,解决了,仔细读了一遍源码,没有问题

tcxia commented 2 weeks ago

@woshixiaobai2019 大佬,我其实想问下您这边如何推理的,我这边推理一直报错

woshixiaobai2019 commented 2 weeks ago

@woshixiaobai2019 大佬,我其实想问下您这边如何推理的,我这边推理一直报错

模仿modelling_interml里面的reward model推理

tcxia commented 2 weeks ago

@woshixiaobai2019 能给个完整路径参考吗?非常感谢~

woshixiaobai2019 commented 1 week ago

@woshixiaobai2019 能给个完整路径参考吗?非常感谢~

https://huggingface.co/internlm/internlm2-1_8b-reward/blob/main/modeling_internlm2.py

这里reward model的forward函数