beyondguo / LLM-Tuning

Tuning LLMs with no tears💦; Sample Design Engineering (SDE) for more efficient downstream-tuning.
956 stars 98 forks source link

为什么ppo model 需要接AutoModelForCausalLMWithValueHead呢? #51

Open jiahuanluo opened 1 year ago

jiahuanluo commented 1 year ago

感谢工作! 请问这里 ppo model 为什么要接一个valuehead 呢? https://github.com/beyondguo/LLM-Tuning/blob/ed68123815bc0add9ad2d7ddc2a48dc584db2c94/RLHF/rl_training.py#L185C1-L185C11 这个head好像随机初始化的?

nghuyong commented 11 months ago

因为还有一个cirtic model