-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
### Exp…
-
### 🚀 The feature, motivation, and pitch
I've been working on RLHF for a while and have been exploring the use of Minimum Risk Training (paper: [here](https://arxiv.org/abs/1512.02433) with further…
-
I've been trying out ABC with GPT-2 along the lines of [my poetry generation](https://www.gwern.net/GPT-2) (max-likelihood training and then I'll use OA's RL preference-learning for finetuning), and I…
gwern updated
4 years ago
-
You have completed a very good model!
I also achieved very good results when I was working on your model. But there are still some questions that are not very clear. Are you experiencing gradient e…
-
It's nice work! However I have a question. Since I'm not so familiar with Reinforce Learning, I wonder which part of it has RL? In 3.3.2 fine-tuning, "Update the model P(G,S) on the fine-tuning set $D…
-
Rudolf přidal na Zotero nějaké další poetry papers z loňského ICCC:
https://www.zotero.org/groups/5184983/poetrygeneration/items/AR7KTGPK
- On the power of special-purpose GPT models to create and e…
-
Hi,
Because i don't know how to use the slurm, i try to directly run the train_lanuage_agent.py as the command in lamorel
`python -m lamorel_launcher.launch --config-path /home/yanxue/Groun…
-
Thank you for the clear-cut amazing video tutorial and repo. I have been working on this repo and faced the following issue on 8 GPU A100 with OS disk space of 100GB and 5TB external. Could you kindly…
-
overview: https://openai.com/index/introducing-openai-o1-preview/
テクニカルレポート: https://openai.com/index/learning-to-reason-with-llms/
-