Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
https://gymnasium.farama.org
MIT License
6.83k stars 762 forks source link

[Bug Report] `mujoco.InvertedDoublePendulum` last 2 observations (constraints) are const 0 #228

Closed Kallinteris-Andreas closed 1 year ago

Kallinteris-Andreas commented 1 year ago

Describe the bug

The last 2 observations of InvertedDoublePendulum are always 0 the y and z axis have no constrain forces

@rodrigodelazcano can you confirm

Code example

import gymnasium
import numpy as np

env = gymnasium.make("InvertedDoublePendulum-v4")

for eps in range(100):
    env.reset()
    for _ in range(1000):
        obs, _, _, _ , _ = env.step(env.action_space.sample())
        assert obs[-1] == 0 and obs[-2] == 0

System info

No response

Additional context

No response

Checklist

pseudo-rnd-thoughts commented 1 year ago

@rodrigodelazcano any ideas

@Kallinteris-Andreas What do the observations represent? Is there a good reason why the element could be zero?

rodrigodelazcano commented 1 year ago

@Kallinteris-Andreas you are right. I suggest labeling these issues as enhancement instead of bug since they shouldn't be adding any anomalies in the observation, only redundancy. If I'm not wrong the constant values in the observation don't affect the sample distribution.

@pseudo-rnd-thoughts this is happening because the double pendulum has three degrees of freedom. The actuated joint (base of the pendulum) slides through a rail with limits on both sides. When the base reaches one of the rail limits and keeps actuating in that direction an external force in the opposite direction appears.

The other two joints are hinge joints that connect the links of the pendulum. These joints don't have any friction thus the moment arm should be 0 .

If you try this you can see how obs[-3] blows up when reaching each rail extreme:

import gymnasium
import numpy as np

env = gymnasium.make("InvertedDoublePendulum-v4", render_mode='human')
action = np.array([-1])
for eps in range(100):
    env.reset()
    action*=-1
    for _ in range(100):
        obs, _, _, _ , _ = env.step(action)
        print(obs[-3:])