rlhf Search Results - Githubissues

1000+ results
for rlhf

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ethz-spylab/rlhf-poisoning #9

code understanding

I would now like to be able to read your code and make changes, any suggested ideas, can you say what the classes defined in safe-rlhf mean? such as AutoModelForScore, PreferenceDataset. What's more, …

hanbaoergogo updated 5 days ago
2
SafeAILab/RAIN #3

test on HH-RLHF

I see the code and find that in the HH-RLHF dataset you use the red-team data for test. I want to know how the test scores are calculated? I didnt find ground-truth in the red-team dataset. How are th…

LoverLost updated 5 months ago
2
ethz-spylab/rlhf-poisoning #8

Evaluation Dataset

Hello, I would like to ask how to create an evaluation dataset. When I directly run `python evaluate_generation_model.py --model_path ../../LLM_Models/poison-7b-SUDO- --token SUDO --report_path ./…

chiayi-hsu updated 2 months ago
5
ethz-spylab/rlhf-poisoning #7

No module named 'safe_rlhf'

Failed to run the evaluation script.

Oklahomawhore updated 3 months ago
1
pytorch/rl #2271

[Feature Request] multi-turn reward for RLHF

Implement rewards as proposed in https://arxiv.org/pdf/2405.14655

vmoens updated 2 months ago
1
modelscope/ms-swift #2108

qwen2 audio dpo微调报错

**Describe the bug** ![image](https://github.com/user-attachments/assets/bc125f23-b4e3-4786-a062-684944e42140) **Additional context** SIZE_FACTOR=8 MAX_PIXELS=602112 torchrun --nproc_per_node …

zhangfan-algo updated 6 days ago
1
NiuTrans/Vision-LLM-Alignment #5

AttributeError: 'PreTrainedTokenizerFast' object has no attr…

Hi, Thanks for the great work. I have finetuned the model using [LLaVA-More](https://github.com/aimagelab/LLaVA-MORE) repository on llama3. Now when I try to adapt your code I am getting `Attribute…

mzamini92 updated 4 days ago
4
vllm-project/vllm #7646

[Feature]: support chunked_prefill for llava

### 🚀 The feature, motivation and pitch I want to use this feature to speed up the throughput in generation step under RLHF. ### Alternatives _No response_ ### Additional context al…

Jack47 updated 1 week ago
1
InternLM/xtuner #93

About RLHF need

需要实现几种对齐算法 1.PPO 这个没的说，比较传统和通用，但是训练的开销会大一点 2. RAFT LMFLOW社区有做 `https://optimalscale.github.io/LMFlow/examples/raft.html` 3.pangu-coder2 RRTF (Rank Responses to align Test&Teacher Feedback) 总结一…

xiaohangguo updated 9 months ago
3
sunzeyeah/RLHF #23

基于ChatGLM2的RLHF训练问题

[2023-08-12 01:22:11,409] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.10.0, git-hash=unknown, git-branch=unknown Traceback (most recent call last): File "/root/inpc_projects…

UltraZeroyH updated 2 months ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for rlhf

1000+ results
for rlhf