Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
https://gymnasium.farama.org
MIT License
7.39k stars 836 forks source link

[Bug Report] LunarLander continuous mode immediately raises exception due to incompatibility between numpy and box2d #1142

Closed torchipeppo closed 3 months ago

torchipeppo commented 3 months ago

Describe the bug

When using the LunarLander-v2 environment with continuous=True, the first step call (or one of the first few anyway) results into an exception being raised at this point of lunar_lander.py:

self.lander.ApplyLinearImpulse(   # line 568
                (-ox * MAIN_ENGINE_POWER * m_power, -oy * MAIN_ENGINE_POWER * m_power),
                impulse_pos,
                True,
            )

The exception in question being: TypeError: Converting from sequence to b2Vec2, expected int/float arguments index 0

See additional context for my thoughts on the matter.

Code example

import gymnasium as gym
env = gym.make(id="LunarLander-v2", continuous=True)
obs, _info = env.reset()
obs2, rew, term, trunc, _info = env.step((0.61, -0.81))
obs2, rew, term, trunc, _info = env.step((0.62, -0.82))
obs2, rew, term, trunc, _info = env.step((0.63, -0.83))
obs2, rew, term, trunc, _info = env.step((0.64, -0.84))
obs2, rew, term, trunc, _info = env.step((0.65, -0.85))

System info

Additional context

All line numbers in this report refer to lunar_lander.py as of its current status in the main branch at the time of writing, i.e. commit 8161d7d.

I have already done a little investigation, and have strong suspicions on the root cause of the problem as well as a preliminary "hot fix" for it.

The problematic argument of the ApplyLinearImpulse call appears to be the impulse (first arg), i.e. (-ox * MAIN_ENGINE_POWER * m_power, -oy * MAIN_ENGINE_POWER * m_power) (line 569). Printing this expression in pdb yielded a tuple of numpy scalars. Still in pdb, attempting to call ApplyLinearImpulse with the first argument replaced by (0,0) yielded no errors.

Therefore, I strongly suspect that some point in the framework, possibly the Box2D language bindings, do not recognize numpy types as numeric types, but rather as generic objects. Looking deeper at the expression for the impulse argument, the only numpy-type part is m_power. This appears to be due to an earlier np.clip call that is only executed if self.continuous is true.

My suspicions are confirmed by the fact that I was able to successfully implement the following "hot fix": add .item() at the end of a few lines containing numpy calls, so that Box2D is only given native Python floats. Specifically:

"Successfully implemented" means that this version has not raised the reported exception (or any other exception) for the entirety of a 400000-step training session, where one can reasonably expect that a variety of states and actions was encountered. A qualitative look at the recorded videos shows no immediate weirdness with the environment, either.

With this, I am not implying that my fix is the best, or the most efficient. I am only reporting it in order to highlight the problem, and to facilitate reproduction.

Finally, though I have only investigated the one environment that is relevant to my current purposes, this issue might extend to other Box2D environments here, too.

Checklist

pseudo-rnd-thoughts commented 3 months ago

This is an issue related to Numpy 2.0, this is fixed in main or downgrade to Numpy 1.x

See https://github.com/Farama-Foundation/Gymnasium/pull/1094/files for the fix