aidudezzz / deepbots

A wrapper framework for Reinforcement Learning in the Webots robot simulator using Python 3.
https://deepbots.readthedocs.io/
GNU General Public License v3.0
243 stars 50 forks source link

A bug of robot never moving #141

Closed speedhawk closed 1 year ago

speedhawk commented 1 year ago

Hi, as a newer of robot simulation, I really feel interested in this plugin able to implement RL algorithm in a 3D simulation environment of Webots s.t. I studied the package in detail. Based on the tutorial implementing CartPole game, one of the most important factor of deepbots is the data structure of observation and action space, which includes Box class for continuous parameters (such as the four observation in CartPole) and Discrete class for discrete parameters (such as discrete action). Based on this, I tried to implement an environment of my own project, a path planning project with e-punk robot. Firstly, the code below means the observation space with six parameters (including x-axis position, y-axis position, the sensor value of ps0, ps1, ps6 and ps7 for the measurement of obstacles) and three allowed actions of e-punk:

self.observation_space = Box(low=np.array([-0.25, -0.88, 0.00, 0.00, 0.00, 0.00]),
                                     high=np.array([0.75, 0.12, 4095.00, 4095.00, 4095.00, 4095.00]),
                                     dtype=np.float64)
# set the action space (turn left, turn right, go ahead)
self.action_space = Discrete(3)

and the reference of nodes on e-punk robots:

self.robot = self.getSelf()
# wheels
self.leftMotor = self.getDevice("left wheel motor")
self.rightMotor = self. getDevice("right wheel motor")
self.leftMotor.setPosition(float('inf'))
self.rightMotor.setPosition(float('inf'))
self.leftMotor.setVelocity(0.0)
self.rightMotor.setVelocity(0.0)

# proximity sensors
self.ps = []
for ps_name in ['ps0', 'ps1', 'ps2', 'ps3', 'ps4', 'ps5', 'ps6', 'ps7']:
      ps = self.getDevice(ps_name)
      self.ps.append(ps)
      self.ps[len(self.ps)-1].enable(self.timestep)

Additionally, in apply_action() function, I have added the behaviors corresponding to the three actions:

def apply_action(self, action):
    action = int(action[0])

    l_speed = 3.14
    r_speed = 3.14

    if action == 0:     # go straight
        l_speed = 3.14
        r_speed = 3.14
    if action == 1:     # turn right
        l_speed = 3.14
        r_speed = -3.14
    if action == 2:     # turn left
        l_speed = -3.14
        r_speed = 3.14

    self.leftMotor.setVelocity(l_speed)
    self.rightMotor.setVelocity(r_speed)

My initial assumption was that if there were no compilation errors in the PPO algorithm, then my robot would at least be able to do random movements. Unfortunately, the e-punk robot kept staying the initialized coordinate and never move even one step instead, despite of the fact that the speed parameters of the corresponding action in each iteration step has worked on the motors based on the print-output information (in which the left speed and right speed is returned by getVelocity() function):

...
new_observation:  [0.05999999998225913, 0.09000000009921671, 70.17436453508148, 111.77694130888727, 87.54746205144774, 70.4848960311054]
time:  0.031017780303955078
action:  [0]
action:  0
left speed:  3.14
right speed:  3.14
...

In general, the only differences between my project and CartPole are the dimension of state-action space (the data-structure class) and the agent robots (I just deployed an e-punk robot instead of the four-wheel). Besides, the code has no errors throughout the execution process except for the weird non-moving bug. I will be really graceful if you can help me to solve it or give any advice to fix it. Many thanks!!

tsampazk commented 1 year ago

Hello, i have already provided an answer here. Could you please respond there so we can look into your issue?