Kautenja / gym-super-mario-bros

An OpenAI Gym interface to Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The NES
Other
678 stars 133 forks source link

Incorrect number of arguments from call to env.step(action) #119

Open JRod-Seed opened 1 year ago

JRod-Seed commented 1 year ago

I just installed gym-super-mario-bros and I've attempted to run the gym using a number of methods including the provided sample code:

from nes_py.wrappers import JoypadSpace import gym_super_mario_bros from gym_super_mario_bros.actions import SIMPLE_MOVEMENT env = gym_super_mario_bros.make('SuperMarioBros-v0') env = JoypadSpace(env, SIMPLE_MOVEMENT)

done = True for step in range(5000): if done: state = env.reset() state, reward, done, info = env.step(env.action_space.sample()) env.render()

env.close()

... as well as from the command line using:

gym_super_mario_bros -e SuperMarioBrosRandomStages-v0 -m human --stages 1-1

In both cases, the gym attempts to render the first frame of the game and then I receive the following error:

/Users/--/miniforge3/envs/tf/lib/python3.9/site-packages/gym/envs/registration.py:555: UserWarning: WARN: The environment SuperMarioBrosRandomStages-v0 is out of date. You should consider upgrading to version v3. logger.warn( /Users/--/miniforge3/envs/tf/lib/python3.9/site-packages/gym/utils/passive_env_checker.py:195: UserWarning: WARN: The result returned by env.reset() was not a tuple of the form (obs, info), where obs is a observation and info is a dictionary containing additional information. Actual type: <class 'numpy.ndarray'> logger.warn( /Users/--/miniforge3/envs/tf/lib/python3.9/site-packages/gym/utils/passive_env_checker.py:219: DeprecationWarning: WARN: Core environment is written in old step API which returns one bool instead of two. It is recommended to rewrite the environment with new step API. logger.deprecation( Traceback (most recent call last): File "/Users/--/miniforge3/envs/tf/bin/gym_super_mario_bros", line 8, in sys.exit(main()) File "/Users/--/miniforge3/envs/tf/lib/python3.9/site-packages/gym_super_mario_bros/_app/cli.py", line 71, in main play_human(env) File "/Users/--/miniforge3/envs/tf/lib/python3.9/site-packages/nes_py/app/play_human.py", line 69, in play_human nextstate, reward, done, = env.step(action) File "/Users/--/miniforge3/envs/tf/lib/python3.9/site-packages/gym/wrappers/time_limit.py", line 50, in step observation, reward, terminated, truncated, info = self.env.step(action) ValueError: not enough values to unpack (expected 5, got 4)

My library versions are as follows:

gym 0.26.2 gym-notices 0.0.8 gym-super-mario-bros 7.4.0 nes-py 8.2.1

Running Python 3.9

Any help would be appreciated

JRod-Seed commented 1 year ago

It would appear that the issue stems from a mismatch in the expected return from env.step. gym/wrappers/time_limit.py returns 5 parameters (observation, reward, terminated, truncated, info) but the invocation is only expecting 4 (observation, reward, terminated, info)

Likely related to: https://github.com/Kautenja/nes-py/issues/85

Robwc000 commented 1 year ago

I'm having the same issue what is the fix?

zxdclyz commented 1 year ago

Try to use gym==0.25.1, it works for me

pseudo-rnd-thoughts commented 1 year ago

With v0.26 you can use this code

from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
import gym

env = gym.make('SuperMarioBros-v0', apply_api_compatibility=True, render_mode="human")
env = JoypadSpace(env, SIMPLE_MOVEMENT)

done = True
env.reset()
for step in range(5000):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated

    if done:
       state = env.reset()

env.close()

You can read about the migration guide for the new API https://gymnasium.farama.org/content/migration-guide/

PowderFan87 commented 1 year ago

I've debugged the step functions and it seems that nes-py returns 4 or 5 length tuple. They have a code snippet that checks for if(Len(result)==4) and for 5. That seems to be the "way to go" it just is not implemented in the gym/wrappers/time_limit.py

I adapted the snipped and set truncated to False as a default:

def step(self, action):
    """Steps through the environment and if the number of steps elapsed exceeds ``max_episode_steps`` then truncate.

    Args:
        action: The environment step action

    Returns:
        The environment step ``(observation, reward, terminated, truncated, info)`` with `truncated=True`
        if the number of steps elapsed >= max episode steps

    """

    truncated = False
    result = self.env.step(action)

    if (len(result) == 4):
        observation, reward, terminated, info = self.env.step(action)
    elif (len(result) == 5):
        observation, reward, terminated, truncated, info = self.env.step(action)
    else:
        raise Exception('Tupel length of step return was neither 4 nore 5. Stop run.')

    self._elapsed_steps += 1

    if self._elapsed_steps >= self._max_episode_steps:
        truncated = True

    return observation, reward, terminated, truncated, info

so instead of just reading 5 every time I do the same check and this works fine. But I realise changing the lib is bad mojo so maybe this could be a fix ;) with a proper Exception added...