Closed blmussati closed 3 months ago
Hi, I was running some simple tests with BiWalk.
My code is
robot_structure, robot_connections = sample_robot((5, 5)) biwalk = BiWalk(body=robot_structure, connections=robot_connections) biwalk.default_viewer.track_objects('robot') for _ in range(500): action = biwalk.action_space.sample() ob, rew, done, info = biwalk.step(action) biwalk.render(verbose=True) if done: biwalk.reset() biwalk.close()
And I noticed that the robot would stand still. BiWalk is a subclass of GoalBase, which in turn is a subclass of BenchmarkBase
After some debugging, I realised that the step function of BenchmarkBase does the following:
def step(self, action): action_copy = {} for robot_name, a in action.items(): action_copy[robot_name] = a + 1 return super().step(action_copy)
Where super() is EvoGymBase.
action is a numpy array, where each entry is the action for each of the robot's actuators, and it's bound between 0.6 and 1.6. However, because of the line
action_copy[robot_name] = a + 1
Each element becomes >= 1.6 and hence the robot doesn't move.Could someone explain such line? It seems unnecessary and it would crash any baseline environment...
The default activation function used in PPO policy is Tanh(), which means the range of action is limited to [-1, 1], with +1 operation, the output range will be transferred to [0,2]. Then a clip function will finally limit the range to [0.6,1.6].
Hi, I was running some simple tests with BiWalk.
My code is
And I noticed that the robot would stand still. BiWalk is a subclass of GoalBase, which in turn is a subclass of BenchmarkBase
After some debugging, I realised that the step function of BenchmarkBase does the following:
Where super() is EvoGymBase.
action is a numpy array, where each entry is the action for each of the robot's actuators, and it's bound between 0.6 and 1.6. However, because of the line
action_copy[robot_name] = a + 1
Each element becomes >= 1.6 and hence the robot doesn't move.Could someone explain such line? It seems unnecessary and it would crash any baseline environment...