rlhf Search Results - Githubissues

1000+ results
for rlhf

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/nccl #1235

"torch.distributed.DistBackendError: NCCL error" when using …

I trained a Llama2-3B model using OpenRLHF and it trained fine. But when I shifted to the 7B version of the model, I had to shift to multiple nodes and encountered this error. After contacting the sup…

StwayneXG updated 6 months ago
4
Lightning-AI/lit-llama #237

RLHF with LoRA

What would be the most straightforward way to do RLHF using LoRA after fine-tuning? Is this fine-tuning compatible with this? https://huggingface.co/blog/trl-peft Would like to submit a content req…

austinmw updated 1 year ago
1
nebuly-ai/optimate #312

[ChatLLaMA] RLHF Training: dimension mismatch

I am getting the following error when doing RLHF training: Traceback (most recent call last): File "/code/main.py", in rlhf_trainer.train() File "/code/trainer.py", in train self.lea…

BigRoddy updated 1 year ago
3
nebuly-ai/optimate #240

[Chatllama] RLHF training for Actor

When I was training the actor with reinforcement learning, I encountered the following bug: Current device used :cuda Start RL Training Episode: 1 of 100, Timestep: 1 of 8 ../aten/src/ATen/native/…

Vincent131499 updated 1 year ago
4
modelscope/ms-swift #1938

Training stops for `KTO` after model loads into memory.

**Describe the bug** What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图) The process stops after loading the model into memory and processing dataset. I also tried an…

Aunali321 updated 3 weeks ago
5
PWhiddy/PokemonRedExperiments #101

RLHF to train reward model?

In terms of reward function, would we be interested in using RLHF too train a dedicated model for reward? from my research we can do this by either: Have a human rank the small clips of game play and…

Iron-Bound updated 10 months ago
3
microsoft/DeepSpeedExamples #861

RLHF problems when using Qwen model

I'm trying to use DeepSpeed-Chat stage2 scripts to do rlhf with Qwen1.8b-chat model，I change some parts in dschat and main.py to load my model, the most different part is: ``` if 'Qwen' in model_nam…

128Ghe980 updated 6 months ago
1
WzWang-Robot/SAN_NaviSTAR #2

Question about this paper and code

How are you @WzWang-Robot ? I read your paper and code. I have two question this paper and code. 1. A generally Preference based RL(PbRL) assumes that MDP is in a "fixed horizon". But in your paper…

CAI23sbP updated 3 weeks ago
1
OpenRLHF/OpenRLHF #308

Dummy token for prompts in HH datasets

I am issue regarding the `Anthropic/hh-rlhf` dataset, in `reward_dataset.py`: ```python # Anthropic/hh-rlhf # tasksource/oasst1_pairwise_rlhf_reward if exist_and_not_none(data, "chosen") and exist…

louieworth updated 4 months ago
2
Stability-AI/StableLM #65

RLHF training code

@mcmonkey4eva @twmmason @masaishi @lxe Do you plan to provide some assistance on how to use RLHF to fine tune Vicuna models. Its fairly a new topic introduced in pubic domain , would be great if y…

jaideep11061982 updated 1 year ago
3

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for rlhf

1000+ results
for rlhf