I am trying to implement RL zoo for an customised environment called double pendulum with Pybullet. However, it could not reach convergence after 1mil steps.
The env (named target_reach), hyperameterer yml (ppo_target_reach) and import_env.py were uploaded on following repository
https://github.com/SKYLEO98/target_reach.git
The environment is based on gymnasium framework with Pybullet physics engine.
from typing import Dict, Union
import numpy as np
import gymnasium as gym
from gymnasium import error, spaces, utils
from gymnasium.utils import seeding
from gymnasium.spaces import Box
import os
import collections
import pybullet as p
import pybullet_data
import math
import numpy as np
import random
import matplotlib.pyplot as plt
❓ Question
I am trying to implement RL zoo for an customised environment called double pendulum with Pybullet. However, it could not reach convergence after 1mil steps. The env (named target_reach), hyperameterer yml (ppo_target_reach) and import_env.py were uploaded on following repository https://github.com/SKYLEO98/target_reach.git
The environment is based on gymnasium framework with Pybullet physics engine.
from typing import Dict, Union import numpy as np import gymnasium as gym from gymnasium import error, spaces, utils from gymnasium.utils import seeding from gymnasium.spaces import Box import os import collections import pybullet as p import pybullet_data import math import numpy as np import random import matplotlib.pyplot as plt
class target_reach(gym.Env): metadata = {"render_modes": ["human"], "render_fps": 1} def init( self,render_mode=None, reward_dist_weight: float = 1, reward_control_weight: float = 1, ):
self.physics_client = p.connect(p.GUI)
The .yml file for custom configuration of hyperparameters:
Tuned
target_reach-v0:
<<: *pybullet-defaults
normalize: true n_envs: 8 n_timesteps: !!float 1e6 policy: 'MlpPolicy' batch_size: 64 n_steps: 512 gamma: 0.99 gae_lambda: 0.9 n_epochs: 20 ent_coef: 0.0 sde_sample_freq: 4 max_grad_norm: 0.5 vf_coef: 0.5 learning_rate: !!float 3e-5 use_sde: True clip_range: lin_0.4 policy_kwargs: "dict(log_std_init=-2.7, ortho_init=False, activation_fn=nn.ReLU, net_arch=dict(pi=[256, 256], vf=[256, 256]) )"
Checklist