Closed Kkeirl closed 3 months ago
I apologize for the delayed response. To run Pendulum-v0, you don't need to modify the code separately. You can proceed by creating a config for Pendulum-v0, similar to configs/Ant-v4.yaml.
Example:
device: "cpu"
seed: 77
env:
env_name: "InvertedPendulum-v4"
num_envs: 8
is_continuous: True
state_dim: 4
action_dim: 1
checkpoint_path: "checkpoints/InvertedPendulum"
network:
action_std_init: 0.4
action_std_decay_rate: 0.03
min_action_std: 0.1
action_std_decay_freq: 1e5
shared_layer: False
optimizer:
lr: 3e-4
train:
total_timesteps: 1000000
max_episode_len: 1024
gamma: 0.99
tau: 0.95
ppo:
loss_type: clip
optim_epochs: 10
batch_size: 256
eps_clip: 0.2
coef_value_function: 0.5
coef_entropy_penalty: 0
value_clipping: True
reward_scaler: True
observation_normalizer: False
clipping_gradient: True
scheduler: True
average_interval: 100
max_ckpt_count: 3
advantage_type: 'gae'
off_policy_buffer_size: 0
fraction: 0
对于延迟回复,我深表歉意。要运行 Pendulum-v0,您无需单独修改代码。您可以继续为 Pendulum-v0 创建配置,类似于 configs/Ant-v4.yaml。
例:
device: "cpu" seed: 77 env: env_name: "InvertedPendulum-v4" num_envs: 8 is_continuous: True state_dim: 4 action_dim: 1 checkpoint_path: "checkpoints/InvertedPendulum" network: action_std_init: 0.4 action_std_decay_rate: 0.03 min_action_std: 0.1 action_std_decay_freq: 1e5 shared_layer: False optimizer: lr: 3e-4 train: total_timesteps: 1000000 max_episode_len: 1024 gamma: 0.99 tau: 0.95 ppo: loss_type: clip optim_epochs: 10 batch_size: 256 eps_clip: 0.2 coef_value_function: 0.5 coef_entropy_penalty: 0 value_clipping: True reward_scaler: True observation_normalizer: False clipping_gradient: True scheduler: True average_interval: 100 max_ckpt_count: 3 advantage_type: 'gae' off_policy_buffer_size: 0 fraction: 0
Thank you very much for your reply.
Hi, what should I do if I run the Pendulum-v0 environment with this offpolicy ppo code of yours? Can you please write a simple modified code?