finetuning-rl Search Results

134 results
for finetuning-rl

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/trl #313

Issues running stack llama example

When attempting to run the [stack_llama](https://github.com/lvwerra/trl/tree/main/examples/stack_llama/scripts) example, I was able to run the first two steps: `torchrun --nnodes=1 --nproc_per_nod…

vshah505 updated 1 year ago
3
Ram81/habitat-imitation-baselines #14

about the model weight on hm3d

hi @Ram81 , Do you have the model weight based on hm3d dataset? If you do, can you share with us? Thanks!

KarelZhang updated 1 year ago
2
jannerm/ddpo #5

About Dataset

Have you released the dataset or how to download the datasets, thanks

chaojiewang94 updated 1 year ago
4
CSir1996/VLN-GELA #3

Bug Report: RuntimeError: cannot reshape tensor of 0 element…

Firstly, thanks for your innovate and excellent work! I got an error when I try to reproduce the results of the paper (in the pretraining stage). Would you like to help me please? Of course, I'll try…

zerowing-ex updated 10 months ago
22
CarperAI/trlx #390

ILQL consumes more GPU memory than PPO

I tried to deploy PPO and ILQL algorithms with the same bloom3B model under examples/summarize_rlhf/, and changed the reward model to a naive calculation. My GPU is A100 with 32GB. I need to adjust t…

hzwer updated 1 year ago
4
salesforce/CodeRL #29

what is the super-parameters for RL training

Hi, thanks for the nice job. I try to reproduce the result reported in the paper. However, I didn't find the detail about the training parameters (eg. learning rate, number of epoch) of second stage f…

Zyq-scut updated 1 year ago
2
salesforce/CodeRL #13

Actor model finetuning code based on reward and policy gradi…

Thanks for the great work! Is it possible that you can share the code of whole RL framework finetuning (Actor & Critic updates based on the reward defined in the paper) for better reproducibility? For…

parshinsh updated 1 year ago
1
LAION-AI/Open-Assistant #186

Supervised data

A large part of making the assistant is to teach it to follow instructions. While training using RLHF seems like the main ingredient, there are already prepared supervised instruction-following datase…

prompteus updated 1 year ago
15
LAION-AI/Open-Assistant #1981

ML Overview [temporary coordination issue, will be split up]

# Action Plan for ML-Team ### 1. Data mixes - [ ] create a list of all datasets under consideration for OA SFT, identify datasets that need further processing (e.g. multi-turn and need to be con…

andreaskoepf updated 1 year ago
5
lucidrains/PaLM-rlhf-pytorch #12

Can we just replace PPO+RLHF with a preference models thats …

At meta level, PPO based RLHF is performing minor adjustments to weights to align with human feedback. Can we just replace PPO+RLHF with a preference models thats basically a transformer encoder +…

ssintelli updated 1 year ago
5

上一页 1...7 8 9 10 11 12 13...14 下一页

134 results for finetuning-rl

134 results
for finetuning-rl