edbeeching / godot_rl_agents

An Open Source package that allows video game creators, AI researchers and hobbyists the opportunity to learn complex behaviors for their Non Player Characters or agents
MIT License
902 stars 63 forks source link

[Idea] Implement success rate in logging #175

Open Ivan-267 opened 6 months ago

Ivan-267 commented 6 months ago

Proposal:

In addition to reward, having an overview of the current success rate can be useful in many envs. This can be a very important metric in envs that have a clear goal (e.g. successfully landed for the 3DLander env, sucessfully parked for the 3DCarParking env, etc.).

It seems we could support this with SB3 by implementing:.

https://stable-baselines3.readthedocs.io/en/master/common/logger.html#rollout

success_rate: Mean success rate during training (averaged over stats_window_size episodes, 100 by default), you must pass an extra argument to the Monitor wrapper to log that value (info_keywords=("is_success",)) and provide info["is_success"]=True/False on the final step of the episode

Needs to be considered:

It would be great if this can be added in a way that doesn't affect previous envs (e.g. they either report always true, false, or don't show this statistic).

1) Info sending/receiving: Some modifications would be needed to the plugin and Python env code to send / receive info, optimally preserving compatibility with older envs that don't send info. Once we enable info sending, we can later also set the truncated/terminated flags.

2) Usage / plugin side changes:

func end_episode(final_reward = 0, success = true):
    reward += final_reward
    done = true
    needs_reset = true
    episode_successful = success

(Just a potential usage example, the end episode method is implemented in the env code, not plugin, although we can consider simplifying the process with something like https://github.com/edbeeching/godot_rl_agents_plugin/pull/20, however, that does break compatibility with existing envs)

For compatibility, possibly the simplest way would be to always report episode success as true by default, unless set by the user. Optionally, we could also add a boolean arg to the sb3 example script that sets the monitor to report this stat or not.