-
**Describe the bug**
When I use the fine-tuned LLAMA3 model to run the `examples/raft_align.py` script, I encountered the following error:
```
Traceback (most recent call last):
File "/home/work…
-
Hi @Aligner2024 ,
May I know how to calculate the harmlessness and helpfulness score in Figure 2?
And noticed you changed the equation (2), may I know the reason?
And the code here only cover t…
-
Paper here: https://arxiv.org/pdf/1705.05363.pdf
Discussion:
This is the first paper I have seen that could be a viable way to use curiosity in a game as large as Minecraft. The modules are alread…
-
### System Info
```Shell
PyTorch 2.2.1
DeepSpeed 0.13.4
```
### Information
- [ ] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] One of the scripts in the examples/ …
-
# `TransformerDecoder` Refactor
**Authors:**
* @SalmanMohammadi
with input from:
* @kartikayk
* @ebsmothers
* @pbontrager
## **Summary**
Refactoring `TransformerDecoder` to offer additi…
-
Team, thank you so much for this wonderful toolkit! we are trying to test the vllm setting with mistralai/Mistral-7B-Instruct-v0.2 model with zero2
![image](https://github.com/OpenLLMAI/OpenRLHF/a…
-
# Why
#### As a
user of `pyCMO`
#### I want
to be able to specify different reward models for my scenarios
#### So that
I can train RL agents
# Acceptance Criteria
#### Given
we currently only expo…
-
Hello.
First of all, thank you for your public source.
I have a question.
I wonder what Reward means in this model.
Usually, the queue is Reward for other models, but I think this model is differe…
-
"The generative process is the same as in auto-regressive language models: generation begins with an empty string, and at the 𝑖-th step a token 𝑧𝑖 is sampled"
Since the generative process is conduc…
-
I find the reward function to be the most important part of RLHF, because it is the part which mimics a human evaluator, providing instant feedback to the model.
However, due to ChatGPT's wide rang…