rlhf Search Results - Githubissues

TinyLLaVA/TinyLLaVA_Factory #74

Implementing RLHF trainer ?

R3xpook updated 2 weeks ago

RLHFlow/Online-RLHF #13

More RLHF algorithms in the implementation

I saw the choice of the loss type indicating that several other loss functions can be used like hinge, ipo, raft ... I am wondering whether we only need to modify the loss choice and do not need to…

WayXG updated 1 week ago

ethz-spylab/rlhf-poisoning #7

No module named 'safe_rlhf'

Failed to run the evaluation script.

Oklahomawhore updated 1 week ago

OpenLLMAI/OpenRLHF #291

I am trying to apply RLHF on a text classification task. You can imagine the text classification model i.e. policy model here is `emotion classification`. The pretrained model can output `class number…

vinodrajendran001 updated 1 month ago

InternLM/xtuner #770

Citation for OpenRLHF in relation to the XTuner RLHF code an…

Hi, XTuner Team Could you please add a citation for the source of the Ray+vLLM-based RLHF architecture - OpenRLHF, such as in the README.md file: https://github.com/InternLM/xtuner?tab=readme-ov-fi…

hijkzzz updated 1 day ago

SafeAILab/RAIN #3

test on HH-RLHF

I see the code and find that in the HH-RLHF dataset you use the red-team data for test. I want to know how the test scores are calculated? I didnt find ground-truth in the red-team dataset. How are th…

LoverLost updated 2 months ago

RLHFlow/RLHF-Reward-Modeling #20

How do you implement SLic on pair_pm model?

Hi, thanks for uploading the code for pair_pm! Since in the blog, it seems that you are using SLiC for pair_pm models. In the directory of pair_pm, I can't find the code for using slic methods.

t-sifanwu updated 1 day ago

huggingface/optimum-neuron #261

Support for RLHF

Customers would like to fine-tune LLMs using RLHF and would like to do this using methods such as PPO and DPO. I suppose this will require integration with the [TRL](https://huggingface.co/docs/trl/in…

mmcclean-aws updated 2 weeks ago

OpenLLMAI/OpenRLHF #308

Dummy token for prompts in HH datasets

I am issue regarding the `Anthropic/hh-rlhf` dataset, in `reward_dataset.py`: ```python # Anthropic/hh-rlhf # tasksource/oasst1_pairwise_rlhf_reward if exist_and_not_none(data, "chosen") and exist…

louieworth updated 1 month ago

unslothai/unsloth #663

Does unsloth support/plan to support `RLOOTrainer`?

Hi, I recently came across this really interesting blog on [Putting RL back in RLHF](https://huggingface.co/blog/putting_rl_back_in_rlhf_with_rloo). It looks like unsloth [supports](https://hug…

asmith26 updated 1 week ago

1000+ results
for rlhf