ppo Search Results - Githubissues

1000+ results
for ppo

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

GanjinZero/RRHF #19

PPO implementation

Could you provide the PPO codebase that can reproduce the results of the paper? I have not found it in this repo. Thank you!

yuzc19 updated 1 year ago
2
junxnone/tio #830

RL - PPO

# Reference - 07/2017 [Proximal policy optimization algorithms](https://arxiv.org/abs/1707.06347) # Brief - 基于策略梯度(PG，Policy Gradient)

junxnone updated 2 years ago
1
deepmuseum/Algorithms-for-Reinforcement-Learning #1

PPO implementation

- https://openai.com/blog/openai-baselines-ppo/ - https://medium.com/intro-to-artificial-intelligence/proximal-policy-optimization-ppo-a-policy-based-reinforcement-learning-algorithm-3cf126a7562d - …

SofianChay updated 3 years ago
1
microsoft/Intrepid #1

[feat] Add PPO

Feature request to add PPO

rajan-chari updated 1 year ago
1
ifpe-cti/sysgraph #55

Apresentação PPO

viniciussoaresti updated 6 years ago
1
Unity-Technologies/ml-agents #6142

How can I get 'std of Reward' in POCA

there are 'mean reward' and 'std of reward' in PPO but 'mean reward' and 'mean group reward' in POCA I didn`t find the way to get 'std of reward' in POCA I`ll appreciate if it can be get in YAML bu…

wanghang2000221 updated 1 day ago
1
thu-ml/tianshou #754

lstm+ppo/sac

I wonder if lstm+ppo/sac could use in Tianshou? Since there are some problems.

1900360 updated 11 months ago
4
thu-ml/tianshou #884

RNN+PPO bug in test/continuous/test_ppo.py

### RNN+PPO **when I replace the `ActorProb` to `RecurrentActorProb` and `Critic` to `RecurrentCritic` in `test/continuous/test_ppo.py` , the bug is below:** ``` File "E:\ANCONDA\lib\site-pac…

Caopeng17 updated 11 months ago
2
huggingface/trl #1812

Disparity in the generate function

Hello, I have fine tuned a `[Code-T5](https://huggingface.co/Salesforce/codet5-small)` model on my custom dataset. Now while I was using `trl` to further train the fine tuned model to align it better,…

IP1102 updated 1 month ago
2
pytorch/torchtune #1395

[RFC] RLHF follow-ups

There are several optimizations to our PPO recipe which could help push it closer to SOTA in terms of performance. There are also several pieces of documentation we could offer alongside this recipe t…

SalmanMohammadi updated 1 month ago
1

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for ppo

1000+ results
for ppo