ppo Search Results - Githubissues

1000+ results
for ppo

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch-labs/LeanRL #2

Some installation issues

There seem to be some issues with getting the environment set up. I tried two installation methods. ### Installation 1 One is to do ``` pip install -r requirements/requirements-envpool.txt…

vwxyzjn updated 1 week ago
6
SP-FA/DeepPlace #1

Joint Macro/Standard cell Placement 依旧还是重叠呢

`python main.py --task "fullplace" --algo ppo --use-gae --lr 2.5e-4 --clip-param 0.1 --value-loss-coef 0.5 --num-processes 1 --num-mini-batch 4 --log-interval 1 --use-linear-lr-decay --entropy-coef 0.…

kiloGrand updated 2 days ago
5
huggingface/trl #1917

Error in AutoModelForCausalLMWithValueHead.load_pretrained("…

Given: ppo_model = AutoModelForCausalLMWithValueHead.from_pretrained('path/to/my/AutoModelForCausalLM', torch_dtype="auto") ppo_model.save_pretrained("ppo_model") When I do: ppo_model = AutoMode…

heli-stand updated 1 month ago
1
huggingface/trl #1995

CUDA error: device-side assert triggered

### When I run the following script ``` import torch from accelerate import Accelerator, PartialState from peft import LoraConfig from tqdm import tqdm from transformers import AutoTokenizer, …

TolearnMo updated 2 weeks ago
12
PKU-RL/Plan4MC #3

PPO training details

Hi, an awesome work! I am interested in how we can train a skilled policy by PPO. Would you be able to provide a training code? It would be really helpful for me. Thank you!

junming-yang updated 7 months ago
1
patrick-kidger/equinox #596

Unstable PPO?

Hello Patrick, I am doing an implementation of the PPO algorithm for a custom environment and first wanted to test things out with a standard example and I choose CartPole-v1 implemented with [`gym…

stergiosba updated 10 months ago
7
MichaelTMatthews/Craftax_Baselines #2

Problem with policy save file in view_ppo_agent.py

When running the PPO baseline on my M1 Mac using the command `python ppo.py --save_policy`, I encounter `ValueError: Unrecognized name format` during the policy-saving process within the _save_network…

lbarazza updated 2 weeks ago
2
OpenRLHF/OpenRLHF #232

Is left-padding in PPO strictly necessary?

I noticed that RemoteExperienceMaker left-pads the input sequences even when using vllm for generation: https://github.com/OpenLLMAI/OpenRLHF/blob/dcd379a44eea56625626d1a0832cd3eeda048b21/openrlhf/tr…

mgerstgrasser updated 3 weeks ago
8
NVIDIA/NeMo-Aligner #264

GPTGenerateTRTLLM.trt_llm_exporter.refit failed due to empty…

**Describe the bug** We followed [Accelerated-RLHF.md](https://github.com/NVIDIA/NeMo-Aligner/blob/v0.3.0.trtllm/Accelerated-RLHF.md) to run the accelerate the PPO training by using TensorRT-LLM. A…

renweizhukov updated 1 month ago
1
roboterax/humanoid-gym #33

kinematics and dynamics

Where is the code for robot kinematics and dynamics? Additionally, I would like to migrate our project, PPO, to ROS2 Gazebo for simulation. Do you have any suggestions or references?

Lau-JW updated 1 week ago
3

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for ppo

1000+ results
for ppo