ppo Search Results - Githubissues

1000+ results
for ppo

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/trl #1650

how to save v_head

currently, I use `ppo_trainer.save_pretrained` to save a model that is still in training, because the machine I used is rather unstable, and I would often need to resume retraining should it be interr…

zyzhang1130 updated 3 days ago
4
hiyouga/LLaMA-Factory #4135

【NPU】GLM-4-9B-Chat PPO 出错

### Reminder - [X] I have read the README and searched the existing issues. ### System Info [2024-06-07 10:17:14,980] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator t…

hunterhome updated 2 weeks ago
5
ray-project/ray #45655

[RLlib] Unable to replicate original PPO performance

### What happened + What you expected to happen I can’t seem to replicate the original [PPO](https://arxiv.org/pdf/1707.06347) algorithm's performance when using RLlib's PPO implementation. The hyp…

rajfly updated 1 month ago
1
PKU-RL/Plan4MC #3

PPO training details

Hi, an awesome work! I am interested in how we can train a skilled policy by PPO. Would you be able to provide a training code? It would be really helpful for me. Thank you!

junming-yang updated 4 months ago
1
namin/llm-verified-with-monte-carlo-tree-search #81

Good Work! Any docs for code?

I see your codebase has some functions not mentioned in your paper, such as supporting Lean4 or DPO and PPO, do you have docs for Lean4 and all the scripts in the root file?

hsz0403 updated 1 week ago
2
facebookresearch/Pearl #77

PPO on continuous actions

I noticed that in the PPO agent initialization it forces the `is_action_continuous=False` whereas the PPO algorithm and other libraries implementing PPO allow continuous actions. Can this be added to …

zaksemenov updated 3 months ago
1
huggingface/trl #1569

Bug in calling model.eval() in PPO

When performing PPO step, the code perform the forward pass in [line 798](https://github.com/huggingface/trl/blob/main/trl/trainer/ppo_trainer.py) using the function "batched_forward_pass". However, …

idanshen updated 2 weeks ago
3
Farama-Foundation/HighwayEnv #600

I can't find the hyperparameter for A2C and PPO for highway …

Hi @eleurent . thank you so much for the contribution. Please I need to know how you figured out the hyperparameters of DQN in the highway run env. did you use optuna for optimizing the hyperparameter…

Arch-suzuki-MB updated 2 weeks ago
1
huggingface/optimum-neuron #341

TRL support for SFT / RM / PPO

Does Optimum Neuron have support for [TRL](https://huggingface.co/docs/trl/index) supervised fine-tuning, reward modelling, and PPO using Trainium? Is TRL the best path to support RLHF?

5cp updated 4 days ago
5
pytorch/torchtune #812

[RFC] Proximal Policy Optimisation

# Implementing Proximal Policy Optimisation I've used some of the [PyTorch RFC](https://github.com/pytorch/rfcs/blob/master/README.md) template here for clarity. **Authors:** * @salmanmohammadi…

SalmanMohammadi updated 1 week ago
10

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for ppo

1000+ results
for ppo