ppo Search Results - Githubissues

huggingface/trl #2363

PPO manual reward functions

In PPO training, I would like to apply customized non-parametric reward function, for instance, rule-based rewards based on textual features of generated texts. In this case, I don't need to use rewar…

schmidtj3 updated 3 days ago

upkie/ppo_balancer #6

Unable to run `make train_and_show` or `make train`

After errors leading me to learn that XCode is also a dependency for this project, I was finally able to compile the cpp files/actions when running `make train_and_show` or `make train`. However, I a…

itazap updated 2 days ago

PolicyEngine/policyengine-us #5357

CalWorks pregnancy special needs payment PPO

[Reference](https://epolicy.dpss.lacounty.gov/epolicy/epolicy/server/general/projects_responsive/ePolicyMaster/mergedProjects/CalWORKs/CalWORKs/44-211_6_Pregnancy_Special_Need/44-211_6_Pregnancy_Speci…

PavelMakarchuk updated 6 days ago

Stable-Baselines-Team/stable-baselines3-contrib #265

[Question] Not updating lstm states during training

### ❓ Question In training PPO-Recurrent over different epochs we do not update the LSTM states even though the LSTM weights get updated. Is there a reason to do so? Or is it just to save compute and…

abhinavj98 updated 9 hours ago

huggingface/trl #2353

BUG in the new PPO trainer

### System Info In the init function of the new ppo trainer (renamed from ppo trainer v2), it says ``` if ref_policy is policy: raise ValueError( "`policy` and `ref…

TingchenFu updated 10 hours ago

OpenRLHF/OpenRLHF #498

Support for PPO for PRM?

Does this support PPO with step-level PRM? Currently I only see scripts for PPO with token-level RM. Specifically, how can we train PPO with [OpenRLHF/Mistral-7b-PRM-Math-Shepherd](https://huggingface…

ljb121002 updated 3 weeks ago

xiaotongtt/SkipDiff #1

预训练权重ppo_model.pth

第一阶段的预训练权重怎么来的呀，方便提供一下嘛（/pretrain/ImageNet_premodels/ppo_model.pth）

Meng-333 updated 4 hours ago

AgileRL/AgileRL #275

Multi-Output Network for Custom PPO Agent

I'm working on a custom PPO agent where the actor learns both the mean and variance of the action distribution. To implement this, I've overridden the `get_action` method and modified the actor's `for…

DKarz updated 6 days ago

volcengine/verl #13

KeyError: 'raw_prompt'

Very nice work! I'm runing PPO using the hhrlhf datasets in verl repo. And the error is here. ``` File "/home/syx/rlhf/verl/single_controller/ray/base.py", line 395, in func return getattr…

YixinSong-e updated 1 week ago

volcengine/verl #12

Hangs during vllm rollout, no error message

Hi veRL team, thanks for open-sourcing the great framework. I have successfully run the ppo training of qwen2-7b using 2 nodes, so I think there is no problem with my environment. But I encountered an…

Vamix updated 1 week ago

1000+ results for ppo

1000+ results
for ppo