ppo Search Results - Githubissues

1000+ results
for ppo

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

hpcaitech/ColossalAI #3492

Set tokenizer in PPO

In training the PPO of ColossalChat, two models actor and critic are needed. Can these two models be different? For example, the critic uses the bert model, and the actor uses the GPT model. In differ…

guijuzhejiang updated 1 year ago
3
Stable-Baselines-Team/stable-baselines3-contrib #211

Recurrent PPO Not Training Well on a Very Simple Environment…

### 🐛 Bug I've adapted the environment from this [blog post](https://medium.com/hackernoon/learning-policies-for-learning-policies-meta-reinforcement-learning-rl%C2%B2-in-tensorflow-b15b592a2ddf), …

sreejank updated 3 weeks ago
1
openreasoner/openr #20

Does the training support standalone multi-card, distributed…

### System Info 训练是否支持分布式以及更大模型比较qwen72b？ ### Who can help? @morning9393 ### Information - [X] The official example scripts - [X] My own modified scripts ### Tasks - [x] An offic…

wphtrying updated 1 week ago
3
luchris429/purejaxrl #20

PPO Implementation Ignores Time Limits

Hi, The current PPO implementation does not seem to account for time limits. While the `EpisodeWrapper` from brax is used, which tracks a truncation flag ([source](https://github.com/google/brax/bl…

bheijden updated 6 months ago
4
reinforcement-learning-kr/lets-do-irl #6

ppo save expert demo

hi, how am i supposed to save expert demo in ppo main?

francisduan updated 3 years ago
1
inarikami/keras-rl2 #13

please add PPO, A3C...

Guys, Keras-rl is the best reinforcement learning library. easy to handle despite complex rl algorithmic. Keras-rl is far moore better than stable baseline. please add ppo, a3c and other as dqn is …

arbitrage-technology updated 4 years ago
1
MorvanZhou/Reinforcement-learning-with-tensorflow #148

PPO : Multiply Mu *2 ?

In simply_PPO you multiple the action distribution's (Gaussian) mu by 2, why is that? `mu = 2 * tf.layers.dense(l1, A_DIM, tf.nn.tanh, trainable=trainable)`

lhorus updated 5 years ago
1
openai/spinningup #384

Pytorch PPO Implementation, dimension difference

Hello, apologies if I do this wrong I don't contribute to open source often. I was attempting to run the Pytorch PPO implementation and kept getting several errors regarding the dimension of the obser…

kevin-mahon updated 1 year ago
2
sevenzk/DataToko #9

[PPOS] [Request] Pindah price list

Pak kijan minta si Asui dari Tempo pindah ke Cash, smangat kk.

sevenzk updated 6 years ago
2
vwxyzjn/cleanrl #353

PPO Complex Obs/Action Space

## Problem Description Would it be useful to add a complex (nested/dictionary) action and obs space variant of the PPO algo? I did this for `minerl` and wondered if it would be useful to contribute i…

ttumiel updated 1 year ago
3

上一页 1...15 16 17 18 19 20 21...100 下一页

1000+ results for ppo

1000+ results
for ppo