Closed richard-hajek closed 1 year ago
1) There already exists a reward_thre shold
for some environments
2) not all environments have a best_possible_return
(or in many cases it is not known)
Oh you're right, I didn't think to check the registry for this information, thanks! You may close the issue then
@richard-hajek you can close you own issues https://docs.github.com/en/issues/tracking-your-work-with-issues/closing-an-issue
I am perfectly aware, but sometimes maintainers want issues to stay open for sometime to allow for more discussion.
Proposal
I would very much appreciate there to be a unified way to find out how well did agent do, in a predefined range for all environments. For example, a method that returns 1 on "perfect agent" and 0 on "noop-agent" or the worst agent.
Motivation
I am a student at CTU and I'm trying to learn reinforcement learning. This would be my 3rd attempt, both attempts before were unsuccessful. Besides personal reasons, something that is regularly counter-intuitive to me and took me some time to get over is that there is no "solved :fireworks: :fireworks: :partying_face: " fanfare for the agent.
The success looks like agent getting evaluation score at -20 instead of a -100 score, which is extremely underwhelming and not unified across environments. (What is this environments best score? Is it -20? Is it -10? Or perhaps +100? There is no way to programmatically find out and I gotta check the docs)
For example, this is output from a stable baselines training progress:
Is -70 good? Is -70 bad? I have to actively think about it, which is putting more mental burden on coders. A single number showing "0.99" would be just amazing.
Pitch
Introduce an API to
gymnasium.core.Env
to facilitate evaluation of agents. With the following signature:When rate() returns > 0.95, the agent can be considered "good enough" and the environment "reasonably solved". Both of which definitions are up to the environment author.
Alternatively, the default implementation of rate could just be
last_total_return / best_possible_return
with some math magic to account for negative rewards.Alternatives
Any alternative has the issue that newcomers don't expect this to even be an issue. As a newcomer, I expect to be able to see, trivially, if the last agent "did well". Not to code an entire patchwork of approaches to approximate this. With that being said
Ad-hoc computation: :heavy_plus_sign: Would be flexible :heavy_minus_sign: Would be a huge function, with definitions of
rate()
for each environment :heavy_minus_sign: Wouldn't be integrated into other frameworks ( Such as SB3 ) :heavy_minus_sign: Would break if the environment gets updated with a different reward systemRewardWrapper: :heavy_plus_sign: Universal for all environments :heavy_minus_sign: Cannot be accurate, at best it can tell how well did the agent perform compared to other agents
Gymnasium-Robotics/GoalEnv
: :heavy_minus_sign: Enforces different observations :heavy_minus_sign: Is not installed by default :heavy_minus_sign: Doesn't actually help, when the environment I am learning on doesn't extend GoalEnvAdditional context
I seem to remember that environments had some kind of
reward_range
property, which would alleviate some of these issues, but I can't find it, perhaps I hallucinated it or it was removed?Additionally, it is my ultimate goal to rate the agents in a human-friendly way. I do not care about this specific implementation.
Checklist