Farama-Foundation / stable-retro

Retro games for Reinforcement Learning
https://stable-retro.farama.org/
MIT License
163 stars 34 forks source link

Update testing #14

Closed pseudo-rnd-thoughts closed 1 year ago

pseudo-rnd-thoughts commented 1 year ago

Personally, I found all of the testing confusing and unclear what was happening. I have replicated the tests using pytest.mark.parametrize rather than pytest.fixture as this makes it more clear what is being iterated over IMO.

Looking at the current testing, 3341 passed, 4388 skipped, 13 warnings in 15.63s which is an alarming number skipped. I found the cause is conftest.py which is skipping the tests if the ROMs are missing. I have removed this file exposing the issue rather than hiding it. The new pytest status is 3,369 passing, 0 skipped, 4,360 failing

DON'T MERGE UNTIL TESTS ARE PASSING

Changes

victorsevero commented 1 year ago

Aren't the tests supposed to be skipped? Since most ROMs can't be tested (we couldn't use illegal ROMs), we don't want them to throw errors, but we want to make sure that anyone that uses their own legal copy of the ROM is able to test them before committing a new game environment.

pseudo-rnd-thoughts commented 1 year ago

@victorsevero Yes, this is what I did in deeac25

The reminding error is

FAILED tests/test_python/test_integration.py::test_missing - AssertionError: assert 4 == 0
 +  where 4 = len([('Bomberman-Nes', 'data.json'), ('Bomberman-Nes', 'scenario.json'), ('Bomberman-Nes', 'metadata.json'), ('IceHockey-Nes', 'scenario.json')])

It looks like Bomberman and IceHockey are new environment (not part of the original project). @MatPoliquin What should we do with them?

Currently, I am just logging the warnings, @MatPoliquin do you understand what these mean or why?


tests/test_python/test_integration.py::test_data[BattleCity-Nes-]
  /home/runner/work/stable-retro/stable-retro/tests/test_python/test_integration.py:12: UserWarning: WARN: BattleCity-Nes/data.json: suspicious type >u2 for lives

tests/test_python/test_integration.py::test_data[BoxingLegendsOfTheRing-Genesis-]
  /home/runner/work/stable-retro/stable-retro/tests/test_python/test_integration.py:12: UserWarning: WARN: BoxingLegendsOfTheRing-Genesis/data.json: suspicious type |u1 for score

tests/test_python/test_integration.py::test_data[BoxingLegendsOfTheRing-Genesis-experimental]
  /home/runner/work/stable-retro/stable-retro/tests/test_python/test_integration.py:12: UserWarning: WARN: experimental/BoxingLegendsOfTheRing-Genesis/data.json: suspicious type |u1 for score

tests/test_python/test_integration.py::test_data[BoxingLegendsOfTheRing-Genesis-all]
  /home/runner/work/stable-retro/stable-retro/tests/test_python/test_integration.py:12: UserWarning: WARN: all/BoxingLegendsOfTheRing-Genesis/data.json: suspicious type |u1 for score

tests/test_python/test_integration.py::test_data[BrawlBrothers-Snes-]
  /home/runner/work/stable-retro/stable-retro/tests/test_python/test_integration.py:12: UserWarning: WARN: BrawlBrothers-Snes/data.json: suspicious type |d1 for score

tests/test_python/test_integration.py::test_data[HangOn-Sms-]
  /home/runner/work/stable-retro/stable-retro/tests/test_python/test_integration.py:12: UserWarning: WARN: HangOn-Sms/data.json: suspicious type >d5 for score

tests/test_python/test_integration.py::test_scenario[DavidCranesAmazingTennis-Genesis-experimental]
  /home/runner/work/stable-retro/stable-retro/tests/test_python/test_integration.py:20: UserWarning: WARN: experimental/DavidCranesAmazingTennis-Genesis/scenario.json: suspicious done condition any with more than 2 checks

tests/test_python/test_integration.py::test_scenario[ForemanForReal-Genesis-]
  /home/runner/work/stable-retro/stable-retro/tests/test_python/test_integration.py:20: UserWarning: WARN: ForemanForReal-Genesis/scenario.json: suspicious done condition any with more than 2 checks

tests/test_python/test_integration.py::test_scenario[VirtuaFighter2-Genesis-]
  /home/runner/work/stable-retro/stable-retro/tests/test_python/test_integration.py:20: UserWarning: WARN: VirtuaFighter2-Genesis/scenario.json: suspicious done condition any with more than 2 checks
zbeucler2018 commented 1 year ago

@pseudo-rnd-thoughts I am unsure about the error but I think I can help with the warnings.

Each game in stable-retro has some mappings for certain variables in the ROM's RAM. Each game needs a data.json file which tells stable-retro what address it should look at for a specific variable, and how to format the data from that address. The variables defined in a game's data.json are the variables in the info dict returned from env.step().

For example, the BattleCity-Nes game has a variable lives which is at address 58 and should be interpreted as 2 big endian unsigned bytes (aka >u2) . Heres the page on the original documentation that talks more about it.

The lines that are throwing these errors are stable-retro/retro/testing/tools.py lines 94-96.

I believe that these warnings are just to ensure that the original integrated games had the correct byte-format-to-address mapping for the important variables lives and score.

Maybe they were assuming that in most of the games, the lives variable is usually formatted as a single unsigned byte (|u1) , a single signed byte (|i1) , or a single binary-coded decimal byte (|d1)

MatPoliquin commented 1 year ago

@pseudo-rnd-thoughts For Bomberman and Icehockey we can remove them for now until I properly integrate them. For the warning, adding to zbeucler2018's comments, you can ignore them for now, they are likely false flags. For example HangOn-Sms which I integrated myself works well (A model can beat the whole game) and data is read in the correct format.