-
-
I saw the choice of the loss type indicating that several other loss functions can be used like hinge, ipo, raft ...
I am wondering whether we only need to modify the loss choice and do not need to…
-
Failed to run the evaluation script.
-
I am trying to apply RLHF on a text classification task. You can imagine the text classification model i.e. policy model here is `emotion classification`. The pretrained model can output `class number…
-
Hi, XTuner Team
Could you please add a citation for the source of the Ray+vLLM-based RLHF architecture - OpenRLHF, such as in the README.md file: https://github.com/InternLM/xtuner?tab=readme-ov-fi…
-
I see the code and find that in the HH-RLHF dataset you use the red-team data for test. I want to know how the test scores are calculated? I didnt find ground-truth in the red-team dataset. How are th…
-
Hi, thanks for uploading the code for pair_pm! Since in the blog, it seems that you are using SLiC for pair_pm models. In the directory of pair_pm, I can't find the code for using slic methods.
-
Customers would like to fine-tune LLMs using RLHF and would like to do this using methods such as PPO and DPO. I suppose this will require integration with the [TRL](https://huggingface.co/docs/trl/in…
-
I am issue regarding the `Anthropic/hh-rlhf` dataset, in `reward_dataset.py`:
```python
# Anthropic/hh-rlhf
# tasksource/oasst1_pairwise_rlhf_reward
if exist_and_not_none(data, "chosen") and exist…
-
Hi,
I recently came across this really interesting blog on [Putting RL back in RLHF](https://huggingface.co/blog/putting_rl_back_in_rlhf_with_rloo).
It looks like unsloth [supports](https://hug…