γ参数设置问题，原文建议是5，某一版提交后变成1.1了，这是因为有什么区别吗

RLHF-V / RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

https://rlhf-v.github.io

217 stars 6 forks source link

Closed Stubborn-one closed 3 months ago

Stubborn-one commented 3 months ago

yiranyyu commented 3 months ago

你好，感谢关注！

我们发现 1.1 的权重能够更稳定训练从而适合更广泛的训练环境，所以在 readme 中进行了更新。参数的选择可以根据训练使用的数据和训练情况（比如训练是否稳定）进行调整。