About action low and high initialization

Xianqi-Zhang commented 3 weeks ago

Hi, Thanks for your sharing. I would like to report a bug.

Since self.action_high and self.action_low initialized before self.action_space defination, they are random values, making action normalize/unnormalize not corrctly in step().

https://github.com/carlosferrazza/humanoid-bench/blob/3951d65bc85e16a562d091a7f4820150c20ed91f/humanoid_bench/env.py#L145-L149 https://github.com/carlosferrazza/humanoid-bench/blob/3951d65bc85e16a562d091a7f4820150c20ed91f/humanoid_bench/tasks.py#L53-L55

Test code:

# * test.py
    while step < 10:
        action = np.zeros((61, ))  # * !!!
        ob, reward, terminated, truncated, info = env.step(action)
        img = env.render()
        if args.render_mode == 'rgb_array':
            cv2.imshow('test_env', img[:, :, ::-1])
            cv2.waitKey(1)
        if terminated or truncated:
            env.reset()
        step += 1

# * humanoid_bench.env.py
class HumanoidEnv(MujocoEnv, gym.utils.EzPickle):

  def __init__():
         ...

        from termcolor import cprint
        cprint('11' * 20, 'green')
        print(self.action_high)
        print(self.action_low)

        self.action_space = Box(
            low=-1, high=1, shape=self.action_space.shape, dtype=np.float32
        )

        self.action_high_new = self.action_space.high
        self.action_low_new = self.action_space.low
        cprint('22' * 20, 'yellow')
        print(self.action_high_new)
        print(self.action_low_new)

bug

If you execute an zero action (np.zeros((61, ))), which means I don't want to control any joints, since the action is not normalized/unnormalized correctly, the env still execute a non-zero action, making the robot will still move around instead of staying still.

# * task.py
    def step(self, action):

        from termcolor import cprint
        cprint('33' * 20, 'green')
        print(action)

        action = self.unnormalize_action(action)

        cprint('44' * 20, 'yellow')
        print(action)

        self._env.do_simulation(action, self._env.frame_skip)

bug_2

P.S. Could you tell me where are the action execution and robot control related code ? Maybe you need to update training curves in the paper and the repo...

Best regards.

carlosferrazza commented 2 weeks ago

Hi @Xianqi-Zhang, I don't think this is a bug. Note that: 1) self.action_low and self.action_high are the actual limits of the actuators, while we set self.action_space.high to +1s, and self.action_space.low to -1s. This is intended, since we always unnormalize before applying the actions to the robot. 2) The H1 robot is position controlled, and it's not initialized with zero actions. So, if you provide zeros, it will move to reach the corresponding position.

Hope this helps clarifying! Please let me know if this is still an issue for you.

Xianqi-Zhang commented 1 week ago

Hi, Thanks for your reply. The H1 robot is position controlled. So if I want the robot stand still, the action should be set as:

 action = ob['proprio'][:env.action_space.shape[0]]

Is this correct? Do I need to set something about qvel?

(The title of the question has been modified to avoid misunderstanding.)

Thank you for your any reply. Best regards.

carlosferrazza / humanoid-bench

About action low and high initialization #13