-
any chance you could implement this?
https://github.com/vinhkhuc/ddpo/tree/support_gpu
it's for RLHF type of stuff, [check the paper](https://rl-diffusion.github.io/)
could be really interesting fo…
-
The following error occurred while running cell 10 in **6. Tune language model using PPO with our preference model**.
After adding `__init__.py` to `/content/trlx/examples/summarize_rlhf/reward_model…
-
i try to find "pip install"
-
# URL
- https://arxiv.org/abs/2307.04964
# Affiliations
- Rui Zheng, N/A
- Shihan Dou, N/A
- Songyang Gao, N/A
- Wei Shen, N/A
- Binghai Wang, N/A
- Yan Liu, N/A
- Senjie Jin, N/A
- Qi…
-
Impressive work, it's efficient and potent. Here's a suggestion.
The search is the critical component! It's the bottleneck for answering all queries, given you already possess a robust corpus.
C…
-
### 🐛 Describe the bug
Hi, I'm trying to use `ilql` training on custom data with `flan-t5-large` and `flan-t5-xl` models to fine-tune them using RLHF and `gpt-j-6B` as a reward model.
1. I have …
-
**Is your feature request related to a problem? Please describe.**
We should include a tutorial for the SFT. Although we have SteerLM, including a SFT tutorial is important because it is the simple…
-
hi Umar, What an awesome free lecture and I cannot thank you enough for your service to all of us developers!
Sorry that I have to borrow this place for a question. In slides "RLHF and PPO" page 17…
-
**Describe the bug**
In the third stage of running RLHF, this error occurred.
**To Reproduce**
Steps to reproduce the behavior:
sh step3_rlhf_finetuning/training_scripts/single_gpu/run_1.3b.sh
…
-