Stable-Baselines-Team / stable-baselines3-contrib

Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code
https://sb3-contrib.readthedocs.io
MIT License
465 stars 173 forks source link

Implementing "Sibling Rivalry" Method from "Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards" Paper #224

Open vladyskai opened 8 months ago

vladyskai commented 8 months ago

🚀 Feature

I propose the implementation of the "Sibling Rivalry" method, as outlined in the paper "Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards." Link to GitHub: https://github.com/salesforce/sibling-rivalry

This method offers a novel approach to solving sparse reward tasks in reinforcement learning (RL) by utilizing self-balancing shaped rewards, particularly effective in goal-reaching tasks.

Motivation

While training a PPO agent to play the Battle City game, I've encountered a significant challenge: the agent more or less learns to eliminate enemy tanks but fails to protect the base, often resulting in suboptimal strategies like camping near the enemy spawn spot. I tried everything I could think of to make it defend the base, but it seems to default to camping or doing nothing. This behavior indicates that the agent is stuck in a local optima, focusing solely on tank destruction and neglecting the critical objective of base defense. Implementing the "Sibling Rivalry" method could potentially enable the agent to recognize situations where the base is in danger, encouraging it to learn strategies that involve defending the base rather than just attacking enemies. I hope that this method might help it to avoid this local optima and it might be the key to overcoming the current limitations in the agent's learning process.

Pitch

I suggest integrating the "Sibling Rivalry" method into the PPO algorithm. This would require adapting the code in base/learners/distance.py of the GitHub repository and integrating it into the PPO class as an option.

Alternatives

Currently, other methods like intrinsic curiosity models and reward relabeling strategies are used, but they often show limited performance in hard exploration scenarios. The "Sibling Rivalry" method outperforms these techniques, especially in diverse environments like 3D construction and navigation tasks.

Additional context

This issue has been moved from the DLR-RM/stable-baselines3 repository. Also, I know little about writing proper code and I do not have the technical skills to implement something so complex. I will give it a try anyways, but I'm probably better off with a simpler task like benchmarking it.

Checklist

araffin commented 8 months ago

Original issue: https://github.com/DLR-RM/stable-baselines3/issues/1802