EvolutionGym / evogym

A large-scale benchmark for co-optimizing the design and control of soft robots, as seen in NeurIPS 2021.
https://evolutiongym.github.io/
MIT License
193 stars 33 forks source link

Possible error in BenchmarkBase step function #11

Closed blmussati closed 2 months ago

blmussati commented 2 years ago

Hi, I was running some simple tests with BiWalk.

My code is

robot_structure, robot_connections = sample_robot((5, 5))
biwalk = BiWalk(body=robot_structure, connections=robot_connections)
biwalk.default_viewer.track_objects('robot')
for _ in range(500):   
    action = biwalk.action_space.sample()
    ob, rew, done, info = biwalk.step(action)
    biwalk.render(verbose=True)
    if done:
        biwalk.reset()
biwalk.close()

And I noticed that the robot would stand still. BiWalk is a subclass of GoalBase, which in turn is a subclass of BenchmarkBase

After some debugging, I realised that the step function of BenchmarkBase does the following:

def step(self, action):

    action_copy = {}

    for robot_name, a in action.items():
        action_copy[robot_name] = a + 1

    return super().step(action_copy)

Where super() is EvoGymBase.

action is a numpy array, where each entry is the action for each of the robot's actuators, and it's bound between 0.6 and 1.6. However, because of the line action_copy[robot_name] = a + 1 Each element becomes >= 1.6 and hence the robot doesn't move.

Could someone explain such line? It seems unnecessary and it would crash any baseline environment...

Yuxing-Wang-THU commented 2 years ago

Hi, I was running some simple tests with BiWalk.

My code is

robot_structure, robot_connections = sample_robot((5, 5))
biwalk = BiWalk(body=robot_structure, connections=robot_connections)
biwalk.default_viewer.track_objects('robot')
for _ in range(500):   
    action = biwalk.action_space.sample()
    ob, rew, done, info = biwalk.step(action)
    biwalk.render(verbose=True)
    if done:
        biwalk.reset()
biwalk.close()

And I noticed that the robot would stand still. BiWalk is a subclass of GoalBase, which in turn is a subclass of BenchmarkBase

After some debugging, I realised that the step function of BenchmarkBase does the following:

def step(self, action):

    action_copy = {}

    for robot_name, a in action.items():
        action_copy[robot_name] = a + 1

    return super().step(action_copy)

Where super() is EvoGymBase.

action is a numpy array, where each entry is the action for each of the robot's actuators, and it's bound between 0.6 and 1.6. However, because of the line action_copy[robot_name] = a + 1 Each element becomes >= 1.6 and hence the robot doesn't move.

Could someone explain such line? It seems unnecessary and it would crash any baseline environment...

The default activation function used in PPO policy is Tanh(), which means the range of action is limited to [-1, 1], with +1 operation, the output range will be transferred to [0,2]. Then a clip function will finally limit the range to [0.6,1.6].

jagdeepsb commented 2 months ago

Hey @blmussati, agree that this was a poor design choice, especially since many modern RL algo implementations have better ways of dealing with observation & action space bounds. I've removed this in v2.0.0.