ppo Search Results - Githubissues

1000+ results
for ppo

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

THUDM/WebRL #4

请教一下流程

想请问一下，流程是不是： 0. SFT，计作M0 1. 先用M0采样采很多，计作数据集a（只采样这一次，后面就不采样了）（大概400个？） 2. 在a中选M0 ppl合适的，ppo，训完叫M1 （跑一次multi node脚本） 3. 在a中选M1 ppl合适的，ppo，训完叫M2（跑第二次multi node脚本） 4. 以此类推跑十次？（我在代码里没看到每个phase采样…

Fu-Dayuan updated 3 days ago
10
THUDM/WebRL #6

any more documents?

It can provide more detailed process description documents for SFT, PPO, and inference, including model configuration, data configuration, etc. for running through the preliminary process.

zhengshf updated 5 days ago
2
boyu-ai/Hands-on-RL #87

请教PPO问题

我看PPO这里加载的agent是train on policy的，但是直接train的话并不会有经验池，但PPO中N步更新的时候不是应该有一个经验池吗，就是对应的off policy部分，这里是在哪体现出来的呢？

394262597 updated 1 month ago
1
sweetice/Deep-reinforcement-learning-with-pytorch #6

Bugs in PPO

1) counter 2) for index in BatchSampler(SubsetRandomSampler(range(self.buffer_capacity), self.batch_size, True)):

moonblue333 updated 3 months ago
7
OpenRLHF/OpenRLHF #519

OOM with Llama3-8b on 8*H100

I was running the example script: `examples/scripts/train_ppo_llama.sh`. Basically, it's ppo on llama3-8b with 8*H100, flash_attn, zero3, gradient_checkpointing, adam_offload, but it's OOM after some…

sharptcode updated 1 week ago
1
FLAIROx/Kinetix #4

Erroneously static linkages in large environments

Hi, Big fan of this project! I'm trying to train an RL agent on a bunch of large environments at once, and I'm seeing an issue where some linkages are static/immobile when they shouldn't be. Here a…

kvablack updated 3 days ago
3
thomashopkins32/Minecraft-Virtual-Intelligence #23

Implement temporal difference (TD) learning

For online training, we may have to ditch the complexities of PPO and use a more basic form of temporal difference learning that does not rely on advantage estimation. We also need to decide which …

thomashopkins32 updated 1 week ago
2
Lizhi-sjtu/DRL-code-pytorch #15

ppo训练问题

你好，感谢你提供得代码，对我来说有很大帮助，但是我在用ppo得时候出现了点问题，我是一个初学者，我在训练得时候发现连续得ppo算法接入到我自定义得环境后他得每个episode得奖励都一模一样，网络给出得动作是不同但相差非常小，不知道为什么哪里出了问题

binbinyouli12 updated 1 month ago
3
GazzolaLab/Elastica-RL-control #15

Making this work again

Hi, amazing work here. But the software has moved on and I would like to make it work again. So far I have: * Fixed the code to work with the new pyelastica API * Updated from stable_baselines to…

alexswerner updated 3 weeks ago
2
modelscope/ms-swift #2267

PPO训练模型设置报错

**Describe the bug** 问题1：在使用ppo训练Atom-7Bchat模型时，设置`--lora_target_modules ALL \`报错，若指定名称则不报错，`--lora_target_modules o_proj,up_proj,down_proj,v_proj,k_proj,gate_proj,q_proj \` ![baocuo](https://github…

weiliang987644015 updated 1 month ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for ppo

1000+ results
for ppo