-
I trained a Llama2-3B model using OpenRLHF and it trained fine. But when I shifted to the 7B version of the model, I had to shift to multiple nodes and encountered this error. After contacting the sup…
-
What would be the most straightforward way to do RLHF using LoRA after fine-tuning? Is this fine-tuning compatible with this? https://huggingface.co/blog/trl-peft
Would like to submit a content req…
-
I am getting the following error when doing RLHF training:
Traceback (most recent call last):
File "/code/main.py", in
rlhf_trainer.train()
File "/code/trainer.py", in train
self.lea…
-
When I was training the actor with reinforcement learning, I encountered the following bug:
Current device used :cuda
Start RL Training
Episode: 1 of 100, Timestep: 1 of 8
../aten/src/ATen/native/…
-
**Describe the bug**
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
The process stops after loading the model into memory and processing dataset. I also tried an…
-
In terms of reward function, would we be interested in using RLHF too train a dedicated model for reward?
from my research we can do this by either:
Have a human rank the small clips of game play and…
-
I'm trying to use DeepSpeed-Chat stage2 scripts to do rlhf with Qwen1.8b-chat model,I change some parts in dschat and main.py to load my model, the most different part is:
```
if 'Qwen' in model_nam…
-
How are you @WzWang-Robot ?
I read your paper and code.
I have two question this paper and code.
1. A generally Preference based RL(PbRL) assumes that MDP is in a "fixed horizon". But in your paper…
-
I am issue regarding the `Anthropic/hh-rlhf` dataset, in `reward_dataset.py`:
```python
# Anthropic/hh-rlhf
# tasksource/oasst1_pairwise_rlhf_reward
if exist_and_not_none(data, "chosen") and exist…
-
@mcmonkey4eva @twmmason @masaishi @lxe
Do you plan to provide some assistance on how to use RLHF to fine tune Vicuna models.
Its fairly a new topic introduced in pubic domain , would be great if y…