rlhf Search Results - Githubissues

1000+ results
for rlhf

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

kohya-ss/sd-scripts #575

rl-stablediffusion training

any chance you could implement this? https://github.com/vinhkhuc/ddpo/tree/support_gpu it's for RLHF type of stuff, [check the paper](https://rl-diffusion.github.io/) could be really interesting fo…

nicolai256 updated 1 year ago
2
HumanSignal/RLHF #6

How to fix the following errors?

The following error occurred while running cell 10 in **6. Tune language model using PPO with our preference model**. After adding `__init__.py` to `/content/trlx/examples/summarize_rlhf/reward_model…

missflash updated 4 months ago
1
conceptofmind/LaMDA-rlhf-pytorch #7

how can i install this?

i try to find "pip install"

waikoreaweatherpjt updated 1 year ago
1
AkihikoWatanabe/paper_notes #807

Secrets of RLHF in Large Language Models Part I: PPO, Rui Zh…

# URL - https://arxiv.org/abs/2307.04964 # Affiliations - Rui Zheng, N/A - Shihan Dou, N/A - Songyang Gao, N/A - Wei Shen, N/A - Binghai Wang, N/A - Yan Liu, N/A - Senjie Jin, N/A - Qi…

AkihikoWatanabe updated 11 months ago
2
StampyAI/stampy-chat #138

better search

Impressive work, it's efficient and potent. Here's a suggestion. The search is the critical component! It's the bottleneck for answering all queries, given you already possess a robust corpus. C…

wassname updated 7 months ago
3
CarperAI/trlx #383

Flant-t5-large Deepspeed OVERFLOW! issues + bad outputs aft…

### 🐛 Describe the bug Hi, I'm trying to use `ilql` training on custom data with `flan-t5-large` and `flan-t5-xl` models to fine-tune them using RLHF and `gpt-j-6B` as a reward model. 1. I have …

chainyo updated 1 year ago
8
NVIDIA/NeMo-Aligner #59

Add the SFT tutorial

**Is your feature request related to a problem? Please describe.** We should include a tutorial for the SFT. Although we have SteerLM, including a SFT tutorial is important because it is the simple…

shengyangs updated 9 months ago
7
hkproj/rlhf-ppo #1

question: how is the gradient of the log probs calculated?

hi Umar, What an awesome free lecture and I cannot thank you enough for your service to all of us developers! Sorry that I have to borrow this place for a question. In slides "RLHF and PPO" page 17…

letitfly updated 6 months ago
1
microsoft/DeepSpeed #3383

[BUG]transformer_inference.so: cannot open shared object fil…

**Describe the bug** In the third stage of running RLHF, this error occurred. **To Reproduce** Steps to reproduce the behavior: sh step3_rlhf_finetuning/training_scripts/single_gpu/run_1.3b.sh …

newsongwf updated 1 year ago
4
saied71/LLM-Finetuning #3

DPO

saied71 updated 9 months ago
1

上一页 1...14 15 16 17 18 19 20...100 下一页

1000+ results for rlhf

1000+ results
for rlhf