-
I would now like to be able to read your code and make changes, any suggested ideas, can you say what the classes defined in safe-rlhf mean? such as AutoModelForScore, PreferenceDataset. What's more, …
-
I see the code and find that in the HH-RLHF dataset you use the red-team data for test. I want to know how the test scores are calculated? I didnt find ground-truth in the red-team dataset. How are th…
-
Hello,
I would like to ask how to create an evaluation dataset.
When I directly run `python evaluate_generation_model.py --model_path ../../LLM_Models/poison-7b-SUDO- --token SUDO --report_path ./…
-
Failed to run the evaluation script.
-
Implement rewards as proposed in https://arxiv.org/pdf/2405.14655
-
**Describe the bug**
![image](https://github.com/user-attachments/assets/bc125f23-b4e3-4786-a062-684944e42140)
**Additional context**
SIZE_FACTOR=8 MAX_PIXELS=602112 torchrun --nproc_per_node …
-
Hi, Thanks for the great work.
I have finetuned the model using [LLaVA-More](https://github.com/aimagelab/LLaVA-MORE) repository on llama3. Now when I try to adapt your code I am getting `Attribute…
-
### 🚀 The feature, motivation and pitch
I want to use this feature to speed up the throughput in generation step under RLHF.
### Alternatives
_No response_
### Additional context
al…
-
需要实现几种对齐算法
1.PPO
这个没的说,比较传统和通用,但是训练的开销会大一点
2. RAFT
LMFLOW社区有做
`https://optimalscale.github.io/LMFlow/examples/raft.html`
3.pangu-coder2
RRTF (Rank Responses to align Test&Teacher Feedback)
总结一…
-
[2023-08-12 01:22:11,409] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.10.0, git-hash=unknown, git-branch=unknown
Traceback (most recent call last):
File "/root/inpc_projects…