PaddlePaddle / MetaGym

Collection of Reinforcement Learning / Meta Reinforcement Learning Environments.
Apache License 2.0
275 stars 59 forks source link

logic error in “Quadrotor” environment, maybe #42

Closed YuechengLiu closed 3 years ago

YuechengLiu commented 3 years ago

In source code of rlschool/quadrotor/env.py, line 127 to line 140,

    def step(self, action):
        self.ct += 1
        cmd = np.asarray(action, np.float32)
        self.simulator.step(cmd.tolist(), self.dt)
        sensor_dict = self.simulator.get_sensor()
        state_dict = self.simulator.get_state()

        old_pos = [self.simulator.global_position[0] + self.x_offset,
                   self.simulator.global_position[1] + self.y_offset,
                   self.simulator.global_position[2] + self.z_offset]
        self._update_state(sensor_dict, state_dict)
        new_pos = [self.simulator.global_position[0] + self.x_offset,
                   self.simulator.global_position[1] + self.y_offset,
                   self.simulator.global_position[2] + self.z_offset]

where old_pos and new_pos are absolutely the same thing (might be unexpected). Maybe the code should be like this?

    def step(self, action):
        self.ct += 1
        cmd = np.asarray(action, np.float32)

        # get the old_pos before calling the simulator.
        old_pos = [self.simulator.global_position[0] + self.x_offset,
                   self.simulator.global_position[1] + self.y_offset,
                   self.simulator.global_position[2] + self.z_offset]

        self.simulator.step(cmd.tolist(), self.dt)
        sensor_dict = self.simulator.get_sensor()
        state_dict = self.simulator.get_state()

        self._update_state(sensor_dict, state_dict)
        new_pos = [self.simulator.global_position[0] + self.x_offset,
                   self.simulator.global_position[1] + self.y_offset,
                   self.simulator.global_position[2] + self.z_offset]
xueeinstein commented 3 years ago

@YuechengLiu Thanks for this review! I confirmed this is a logic error :(. I fixed it in PR#43.