When using the LunarLander-v2 environment with continuous=True, the first step call (or one of the first few anyway) results into an exception being raised at this point of lunar_lander.py:
Gymnasium version 0.29.1 installed via pip (in a Python venv, if it matters)
Linux Mint 21.3
Additional context
All line numbers in this report refer to lunar_lander.py as of its current status in the main branch at the time of writing, i.e. commit 8161d7d.
I have already done a little investigation, and have strong suspicions on the root cause of the problem as well as a preliminary "hot fix" for it.
The problematic argument of the ApplyLinearImpulse call appears to be the impulse (first arg), i.e. (-ox * MAIN_ENGINE_POWER * m_power, -oy * MAIN_ENGINE_POWER * m_power) (line 569).
Printing this expression in pdb yielded a tuple of numpy scalars.
Still in pdb, attempting to call ApplyLinearImpulse with the first argument replaced by (0,0) yielded no errors.
Therefore, I strongly suspect that some point in the framework, possibly the Box2D language bindings, do not recognize numpy types as numeric types, but rather as generic objects.
Looking deeper at the expression for the impulse argument, the only numpy-type part is m_power. This appears to be due to an earlier np.clip call that is only executed if self.continuous is true.
My suspicions are confirmed by the fact that I was able to successfully implement the following "hot fix": add .item() at the end of a few lines containing numpy calls, so that Box2D is only given native Python floats.
Specifically:
"Successfully implemented" means that this version has not raised the reported exception (or any other exception) for the entirety of a 400000-step training session, where one can reasonably expect that a variety of states and actions was encountered. A qualitative look at the recorded videos shows no immediate weirdness with the environment, either.
With this, I am not implying that my fix is the best, or the most efficient. I am only reporting it in order to highlight the problem, and to facilitate reproduction.
Finally, though I have only investigated the one environment that is relevant to my current purposes, this issue might extend to other Box2D environments here, too.
Checklist
[X] I have checked that there is no similar issue in the repo
Describe the bug
When using the
LunarLander-v2
environment withcontinuous=True
, the firststep
call (or one of the first few anyway) results into an exception being raised at this point oflunar_lander.py
:The exception in question being:
TypeError: Converting from sequence to b2Vec2, expected int/float arguments index 0
See additional context for my thoughts on the matter.
Code example
System info
Additional context
All line numbers in this report refer to
lunar_lander.py
as of its current status in themain
branch at the time of writing, i.e. commit 8161d7d.I have already done a little investigation, and have strong suspicions on the root cause of the problem as well as a preliminary "hot fix" for it.
The problematic argument of the
ApplyLinearImpulse
call appears to be the impulse (first arg), i.e.(-ox * MAIN_ENGINE_POWER * m_power, -oy * MAIN_ENGINE_POWER * m_power)
(line 569). Printing this expression inpdb
yielded a tuple ofnumpy
scalars. Still inpdb
, attempting to callApplyLinearImpulse
with the first argument replaced by(0,0)
yielded no errors.Therefore, I strongly suspect that some point in the framework, possibly the Box2D language bindings, do not recognize
numpy
types as numeric types, but rather as generic objects. Looking deeper at the expression for the impulse argument, the onlynumpy
-type part ism_power
. This appears to be due to an earliernp.clip
call that is only executed ifself.continuous
is true.My suspicions are confirmed by the fact that I was able to successfully implement the following "hot fix": add
.item()
at the end of a few lines containingnumpy
calls, so that Box2D is only given native Python floats. Specifically:m_power = (np.clip(action[0], 0.0, 1.0) + 1.0) * 0.5 --> m_power = ((np.clip(action[0], 0.0, 1.0) + 1.0) * 0.5).item()
(line 535)direction = np.sign(action[1]) --> direction = (np.sign(action[1])).item()
(line 580)s_power = np.clip(np.abs(action[1]), 0.5, 1.0) --> s_power = (np.clip(np.abs(action[1]), 0.5, 1.0)).item()
(line 581)"Successfully implemented" means that this version has not raised the reported exception (or any other exception) for the entirety of a 400000-step training session, where one can reasonably expect that a variety of states and actions was encountered. A qualitative look at the recorded videos shows no immediate weirdness with the environment, either.
With this, I am not implying that my fix is the best, or the most efficient. I am only reporting it in order to highlight the problem, and to facilitate reproduction.
Finally, though I have only investigated the one environment that is relevant to my current purposes, this issue might extend to other Box2D environments here, too.
Checklist