Score wrapping - Githubissues

Farama-Foundation / Arcade-Learning-Environment

The Arcade Learning Environment (ALE) -- a platform for AI research.

https://ale.farama.org/

GNU General Public License v2.0

2.14k stars 420 forks source link

Score wrapping #103

Open mgbellemare opened 9 years ago

mgbellemare commented 9 years ago

On some games it is possible to loop the score. This has always been the case but is a greater issue on games where it is possible to learn policies that plays forever, e.g. on Atlantis. There should be a unifying scheme for dealing with score wrapping in games where this can occur. Either:

The episode should terminate (and the game can be considered "solved"), or
All evaluations should take place within strict time limitations.

Note that there are games, e.g. Krull, where a simple agent can loop the score without achieving anything meaningful. This needs to be taken into consideration when evaluating agents that loop the score.

nczempin commented 7 years ago

On my Kaboom! rom add, I came upon this issue, because the game freezes when you reach the max score (999,999).

Without having seen this issue, I decided that termination was the logical choice.

Now that I think about it, just checking ">= 999999" is not sufficient, because in that game, the maximum reward you can get is 8 (AFAICS from the manual), so I should check "> 999991". However, since this particular game just hangs, I assume it will eventually hit a timeout of sorts (does it? if not, that should at least be some kind of option, perhaps based on the time since the last reward). Right now each game can do that individually, but it is preferred to have a unified method, as you suggest. such a unified method would need the maximum score and the maximum reward passed from each game.

These values can be determined manually, or at least a good first approximation could be automated, by writing successively larger values into the score memory until something happens (and a better-than-nothing first approximation of "highest possible reward" could be obtained by running a few decent agents, perhaps adding a safety margin).

Incidentally, it would be a good idea to generalize the concept of "score memory" and "lives memory", because then many new (and existing) roms could simply be supported by providing those two sets of values as parameters, rather than duplicating all the code that does the reading, extracting reward, etc.

nczempin commented 7 years ago

However, since this particular game just hangs, I assume it will eventually hit a timeout of sorts

This has been confirmed since; for Kaboom! it is sufficient to check for == 999,999.

mgbellemare commented 7 years ago

@nczempin Answering your suggestions in order:

The most ALE-ical solution is to allow players to score higher than the maximum provided, and keep track of the cumulative score. This is fine from an evaluation perspective because there should still be a time limit on all games, and you may be able to accumulate reward faster than another agent that also loops the score. However, in most cases if you can loop the score then you almost certainly can play the game indefinitely; and this should be reported somewhere.
Re: score and lives memory, the ALE is reasonably stable now and there's probably little advantage to a major refactoring. But it's something to keep in mind.

nczempin commented 7 years ago

I agree with everything in 1).

Re: score and lives memory, the ALE is reasonably stable now and there's probably little advantage to a major refactoring. But it's something to keep in mind.

Don't fully agree; I think there's a huge advantage, but of course I'm fine with risking that my changes might not get pulled.

See https://github.com/mgbellemare/Arcade-Learning-Environment/issues/193 and https://github.com/mgbellemare/Arcade-Learning-Environment/compare/master...nczempin:simplify-roms if you want to check my progress.

I will try to keep it as painless as possible, no interfaces breakages. (and of course I will clean up my convenience changes such as in ale.cfg before I do a PR)

nczempin commented 7 years ago

I also think that I may not have explained well what my plans are for 2). Will try do so eventually...

mgbellemare commented 7 years ago

If you want to drop me an email, I'd be happy to hear more about them :)

nczempin commented 7 years ago

No, it's fine, they are not secret, once I get round to trying to clarify myself, I can just post them here.