MarcoMeter / endless-memory-gym

Challenging Memory-based Deep Reinforcement Learning Agents
MIT License
84 stars 2 forks source link

A Simpler Configuration for Searing Spotlights #8

Closed subho406 closed 9 months ago

subho406 commented 1 year ago

Hi,

I absolutely love this benchmark and I'm planning to use this in a paper I am writing.

I had a question regarding the Searing Spotlights environments. The other two environments, Mortar Mayhem and Mystery Path seems to be solvable by current memory based agents by providing dense reward signals or through grid like movements. However, there was no simpler configuration listed for Searing Spotlights. Is there a configuration available for Searing Spotlights available where a memory-based agent could achieve a decent level performance (which a memory-less agent wouldn't)?

My goal is to use these environments to compare the memory capabilities of different architectures. In Mystery Path and Mortar Mayhem I am able to do so with the simpler variations of these environments. The differences in performance between different memory agents are becoming apparent. But in Searing Spotlights, it seems none of the agents even get to a point where they can leverage their memory, and all agents seem to perform the same. Is there a way to make this environment easier?

MarcoMeter commented 1 year ago

Hi @subho406,

Great to hear that you enjoy this project!

Searing Spotlights is a tricky one. In the ICLR paper, the recurrent PPO agent was unsuccessful even under ground truth. The good news is, I finally found the culprit for failing at this task. Normalizing the advantages, as part of PPO's objective, makes training highly unstable. If you are training a PPO agent, don't normalize advantages.

Another aid to the agent is to add a reconstruction loss to the agent. This way, the agent needs to adopt an autoencoder target by reconstructing the fed observation.

Concerning simplifying Searing Spotlights, I can't think of any suitable simplification that would maintain the memory dependence as well as the task itself.

Here is a website showing multiple agent behaviors on all environments: https://marcometer.github.io/ The website is still being developed, and a related journal paper is also in progress. I hope to have it ready in two weeks.

MarcoMeter commented 9 months ago

I'm closing this now. Let me know if you have further questions. The mentioned journal paper is finally available as preprint. https://arxiv.org/abs/2309.17207