RLHFlow / Online-RLHF

A recipe for online RLHF.
https://rlhflow.github.io/
368 stars 42 forks source link

Fail to load weight from pair-preference-model-LLaMA3-8B #4

Open matouk98 opened 2 months ago

matouk98 commented 2 months ago

Hi, congratulations to the great work and thanks for open source!

I am running step 3.2 with pair-preference-model-LLaMA3-8B. However, I encountered the warning "Some weights of LlamaForSequenceClassification were not initialized from the model checkpoint at RLHFlow/pair-preference-model-LLaMA3-8B and are newly initialized: ['score.weight']". Could you please help me with the issue? Thanks a lot!

WeiXiongUST commented 2 months ago

The current code is for the Bradley Terry reward, which is a ``AutoModelForSequenceClassification''.

In contrast, the pair-preference model is ``AutoModelForCausalLM''. Also the way of using these two models is different. I should write another script for the pair-RM in the next few days.

Thanks for bring this issue to us.

hmzo commented 2 months ago

@WeiXiongUST Hello, is there any recent progress on this? I'm curious about if pair-rm needs $C_k^2$ inferences for k candidates. How can we get the absolute reward score for each candidate?