finetuning-rl Search Results

134 results
for finetuning-rl

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

axolotl-ai-cloud/axolotl #1750

RuntimeError: Cannot re-initialize CUDA in forked subprocess…

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports. ### Exp…

RishabhMaheshwary updated 3 months ago
7
CarperAI/trlx #354

Minimum Risk Training support

### 🚀 The feature, motivation, and pitch I've been working on RLHF for a while and have been exploring the use of Minimum Risk Training (paper: [here](https://arxiv.org/abs/1512.02433) with further…

alexandremuzio updated 1 year ago
2
IraKorshunova/folk-rnn #9

Data formatting issue: missing unique IDs & stray HTML

I've been trying out ABC with GPT-2 along the lines of [my poetry generation](https://www.gwern.net/GPT-2) (max-likelihood training and then I'll use OA's RL preference-learning for finetuning), and I…

gwern updated 4 years ago
20
lisiyao21/Bailando #42

Doubts about the Bailando model

You have completed a very good model! I also achieved very good results when I was working on your model. But there are still some questions that are not very clear. Are you experiencing gradient e…

WJ-Fifth updated 1 year ago
1
wengong-jin/multiobj-rationale #5

In which part does it incorporate RL?

It's nice work! However I have a question. Since I'm not so familiar with Reinforce Learning, I wonder which part of it has RL? In 3.3.2 fine-tuning, "Update the model P(G,S) on the fine-tuning set $D…

YifanDengWHU updated 2 years ago
2
ufal/edupo #10

Rešerše poetry papers na Zoteru

Rudolf přidal na Zotero nějaké další poetry papers z loňského ICCC: https://www.zotero.org/groups/5184983/poetrygeneration/items/AR7KTGPK - On the power of special-purpose GPT models to create and e…

ptakopysk updated 8 months ago
3
flowersteam/Grounding_LLMs_with_online_RL #11

How to run the train_language_agent.py without using slurm

Hi, Because i don't know how to use the slurm, i try to directly run the train_lanuage_agent.py as the command in lamorel `python -m lamorel_launcher.launch --config-path /home/yanxue/Groun…

yanxue7 updated 1 year ago
2
llSourcell/Doctor-Dignity #17

Issue with ppo_trainer.generate()

Thank you for the clear-cut amazing video tutorial and repo. I have been working on this repo and faced the following issue on 8 GPU A100 with OS disk space of 100GB and 5TB external. Could you kindly…

aishu194 updated 1 year ago
2
AkihikoWatanabe/paper_notes #1390

OpenAI o1, 2024.09

overview: https://openai.com/index/introducing-openai-o1-preview/ テクニカルレポート: https://openai.com/index/learning-to-reason-with-llms/

AkihikoWatanabe updated 1 month ago
7
ManifoldRG/Manifold-KB #10

AF Survey - "Towards A Unified Agent with Foundation Models"

bfaught3 updated 9 months ago
2

上一页 1...1 2 3 4 5 6 7...14 下一页

134 results for finetuning-rl

134 results
for finetuning-rl