rlhf Search Results - Githubissues

1000+ results
for rlhf

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

RLHFlow/RLHF-Reward-Modeling #33

Clarification on Reward Usage in DPO Training

In the RLHF workflow paper, the Reward Model is used to annotate new data generated by the LLM during the iterative DPO process, resulting in scalar values. According to Algorithm 1, the traditional R…

vincezh2000 updated 1 week ago
1
microsoft/DeepSpeedExamples #920

FileNotFoundError: [Errno 2] No such file or directory: 'num…

After finishing install successfully, i got this error when ran this command: python e2e_rlhf.py --actor-model facebook/opt-1.3b --reward-model facebook/opt-350m --deployment-type single_gpu ![捕获](…

zhiwentian updated 1 month ago
4
huggingface/optimum-neuron #341

TRL support for SFT / RM / PPO

Does Optimum Neuron have support for [TRL](https://huggingface.co/docs/trl/index) supervised fine-tuning, reward modelling, and PPO using Trainium? Is TRL the best path to support RLHF?

5cp updated 1 week ago
8
sunzeyeah/RLHF #25

Pangu 2.6b 启动失败。

Traceback (most recent call last): File "/mnt/d/ai/RLHF/test.py", line 3, in tokenizer = AutoTokenizer.from_pretrained("/mnt/d/ai/pretrain_models/pangu", trust_remote_code=True) File "/hom…

Liufeiran123 updated 2 months ago
3
THUDM/LongWriter #2

DPO code

Any plans on releasing the DPO code, or a brief intro of how you conducted long-context DPO?

HaoshengZou updated 2 weeks ago
4
MiuLab/Taiwan-LLM #55

The process of RLHF and reward modeling

這個模型是從llama2 SFT出來的話，看llama2的論文似乎llama2並沒有經過RLHF(llama2-chat有)，請問Taiwan llama2有經過RLHF的訓練嗎？如果沒有的話，有關繁體中文的對齊，可以使用RLHF來進行，而非SFT。至於comparison的資料集，可以考慮用ChatGPT來產生，這樣不知有沒有試過，謝謝 ![image](https://github.com/…

joshhu updated 4 months ago
1
InternLM/xtuner #770

Citation for OpenRLHF in relation to the XTuner RLHF code an…

Hi, XTuner Team Could you please add a citation for the source of the Ray+vLLM-based RLHF architecture - OpenRLHF, such as in the README.md file: https://github.com/InternLM/xtuner?tab=readme-ov-fi…

hijkzzz updated 3 months ago
7
Zjh-819/LLMDataHub #3

add some rlhf data?

add some rlhf data?

lucasjinreal updated 1 year ago
1
modelscope/ms-swift #2044

DPO training error `RuntimeError: Expected all tensors to be…

**Describe the bug** Getting the following error only by changing the model to `llava-onevision-qwen2-0_5b-ov` from `llava1_6-mistral-7b-instruct` in the first DPO example [here](https://github.com/m…

Lopa07 updated 4 days ago
6
Lightning-AI/litgpt #557

Adding RLHF support

I just see that we don't have an open issue for RLHF support, yet. I think this is a super important feature since latest models like Llama 2 showed that it's really worthwhile. I can also see that we…

rasbt updated 10 months ago
4

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for rlhf

1000+ results
for rlhf