Kautenja / gym-super-mario-bros

An OpenAI Gym interface to Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The NES
Other
678 stars 133 forks source link

Flag get detection #91

Closed liziniu closed 4 years ago

liziniu commented 5 years ago

Hi, I use the env of 'SuperMarioBros-1-1-v0' and hope to only use this stage.

However, sometimes the wrapper cannot detect a flag is obtained and will run into the next stage.

Is there any way to ensure the simulator will stay in a single stage?

liziniu commented 5 years ago

I observe the variable self.ram[0x001D] and the video. When the Mario agent gets a flag, this variable is mostly 3, but occasionally 2.

I also observe the variable is 1 when the Mario is in a normal state. Can anyone explain the meaning of this variable is 2?

Kautenja commented 5 years ago

Hmm. could you run this shell command to print some version information? Issues with the flag get feature have occurred in the past so I want to make sure this is a new problem.

python3 -c 'import pkg_resources; \
print(pkg_resources.get_distribution("nes-py").version); \
print(pkg_resources.get_distribution("gym-super-mario-bros").version)'
Kautenja commented 5 years ago

Version aside, it looks like fixing this could be as simple as changing line 247 of smb_env.py:

https://github.com/Kautenja/gym-super-mario-bros/blob/9154ece0ec178bf2c914519feb5e5a79080109d2/gym_super_mario_bros/smb_env.py#L237-L249

from:

                return self.ram[0x001D] == 3

to:

                return self.ram[0x001D] in {2, 3}
Kautenja commented 5 years ago

It looks like 0x001D is equal to 2 for more situations than just being on the flag pole, namely, when the flag pole first enters the scene I see 0x001D equal to 2 for a bust of frames, but never when Mario is on the pole. This does not disprove the bugs existance, but does negate the above solutions potential.

Also note the ram map description

0x001D

Player "float" state 0x00 - Standing on solid/else 0x01 - Airborn by jumping 0x02 - Airborn by walking of a ledge 0x03 - Sliding down flagpole

liziniu commented 5 years ago

I run the shell. It outputs the following version information:

6.2.1

7.1.6

I know this is not the latest version, but I can't tell the difference of _is_stage_over between the lastest code and the code I use. Is there any special modification?

After many trials, I also note that 0x001D equal to 2 is not an indicator of flag getting.

My algorithm generates action sequences (and there is another video recorder program shows it occasionally goes into the next stage). What's worse, there is some stochastic factor (like randomly skip the frames) in my algorithm. So it's hard for me to manually reproduce this phenomenon, but I can provide the video recorder if this is helpful for you.

For my research study, I focus on SuperMarioBros-1-1-v0. Thus, I use the x_pos > 3155 as an indicator of flag getting.

Thanks!

Kautenja commented 4 years ago

closing issue as it seems resolved for now.