Yu-Fangxu / FoR

Flow of Reasoning: Efficient Training of LLM Policy with Diverse Thinking
MIT License
24 stars 3 forks source link

An inquiry about the hyper parameter 'll_weight' #1

Closed Amayama closed 2 months ago

Amayama commented 2 months ago

I notice the you had a hyper parameter called ll_weight in each run.sh, and I cannot find the corresponding parameter in the repo and the paper. Can you shall how to select the parameter?

Yu-Fangxu commented 2 months ago

Hi Amayama, ll_weight is the coefficient of \lambda in the paper, you can find it in Section 4.2, Reward Design. ll_weight is used in lightning_module_selection.py I selected it by choosing the best score on the training data, as you can see in Figure 3(a) in the paper

Amayama commented 2 months ago

Great thanks for the kind reply!