Closed vladyskai closed 9 months ago
Hello, could you move this issue to sb3 contrib ? and are you willing to contribute and benchmark this method?
I will post this issue in the stable-baselines3-contrib repository. However, I know little about writing proper code and I do not have the technical skills to implement something so complex. I will give it a try anyways, but I'm probably better off with a simpler task like benchmarking it.
🚀 Feature
I propose the implementation of the "Sibling Rivalry" method, as outlined in the paper "Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards." Link to GitHub: https://github.com/salesforce/sibling-rivalry
This method offers a novel approach to solving sparse reward tasks in reinforcement learning (RL) by utilizing self-balancing shaped rewards, particularly effective in goal-reaching tasks.
Motivation
While training a PPO agent to play the Battle City game, I've encountered a significant challenge: the agent more or less learns to eliminate enemy tanks but fails to protect the base, often resulting in suboptimal strategies like camping near the enemy spawn spot. I tried everything I could think of to make it defend the base, but it seems to default to camping or doing nothing. This behavior indicates that the agent is stuck in a local optima, focusing solely on tank destruction and neglecting the critical objective of base defense. Implementing the "Sibling Rivalry" method could potentially enable the agent to recognize situations where the base is in danger, encouraging it to learn strategies that involve defending the base rather than just attacking enemies. I hope that this method might help it to avoid this local optima and it might be the key to overcoming the current limitations in the agent's learning process.
Pitch
I suggest integrating the "Sibling Rivalry" method into the PPO algorithm. This would require adapting the code in base/learners/distance.py of the GitHub repository and integrating it into the PPO class as an option.
Alternatives
Currently, other methods like intrinsic curiosity models and reward relabeling strategies are used, but they often show limited performance in hard exploration scenarios. The "Sibling Rivalry" method outperforms these techniques, especially in diverse environments like 3D construction and navigation tasks.
Additional context
No response
Checklist