rlhf Search Results - Githubissues

1000+ results
for rlhf

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

mosaicml/llm-foundry #846

Any plan for supporting DPO?

## 🚀 Feature Request Support DPO (Direct Preference Optimization) loss and data loader. ## Motivation Many recent open LLMs have achieved promising results from using DPO instead of RL-style t…

lorabit110 updated 1 month ago
1
OpenLMLab/MOSS-RLHF #36

Inference with SFT and Policy EN models

Hello, I am trying to do some basic inference with your sft and policy models. However, when I instanciate the model directly with LlamaForCausalLM, the generation works well for the base pretrain…

henrypapadatos updated 6 months ago
1
sunzeyeah/RLHF #23

基于ChatGLM2的RLHF训练问题

[2023-08-12 01:22:11,409] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.10.0, git-hash=unknown, git-branch=unknown Traceback (most recent call last): File "/root/inpc_projects…

UltraZeroyH updated 10 months ago
1
meta-llama/llama #283

No module named 'actor'

throw this errror ``` from chatllama.rlhf.trainer import RLTrainer File "C:\Users\Admin\MY WORK\test llama\venv\lib\site-packages\chatllama\rlhf\trainer.py", line 12, in from actor impor…

sdeadlocker updated 9 months ago
1
mindspore-courses/step_into_llm #27

Will you release demo code of RLHF course?

I am a freshman in RLHF, so will you upload a demo code of RLHF? or slide file of RLHF?

LinB203 updated 12 months ago
1
microsoft/DeepSpeedExamples #883

The reward value did not increase.

When I run the demo ( step3_rlhf_finetuning/training_scripts/opt/single_node/run_1.3b.sh) without any change , the reward dose not increase. Is it normal? I would appreciate it if anyone can provide …

Sun-Shiqi updated 3 months ago
1
shibing624/MedicalGPT #377

医学大模型全流程体验

### Describe the Question Please provide a clear and concise description of what the question is. 大佬可以提供一个关于从预训练到SFT再到RLHF的各个阶段训练即推理的例子吗，把这几个串一下，比如预训练后，推理测试，感觉ok后，再进入SFT阶段，完后再推理测试，以此类推，这样有利于大家一起来讨论这…

YoshuaBengio updated 1 month ago
2
opening-up-chatgpt/opening-up-chatgpt.github.io #88

Improve YAML format by including assessment date & model ver…

With the proliferation of models and model variants it becomes more important to track assessment dates and model versions. So far we've been able to treat model families as one, because it rarely …

mdingemanse updated 1 month ago
2
philpax/exilent #31

Look at integrating crumb's RLHF prompter

@aicrumb built a really cool RLHF-trained Stable Diffusion prompter on BLOOM: https://huggingface.co/crumb/bloom-560m-RLHF-SD2-prompter I believe it's possible to convert it to ONNX and then run it…

philpax updated 1 year ago
1
mymusise/ChatGLM-Tuning #175

没有看到RLHF的代码

你好，下载repo后，在modeling_chatglm.py中没有看到RLHF和训练RM及PPO训练RL的代码。在read_me中明明说是和chatGPT一样的技术，支持RLHF的，请问是怎么情况啊。

dongdongrj updated 1 year ago
41

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for rlhf

1000+ results
for rlhf