Integration of Rule-Based Bot Actions for Imitation Learning

vladyskai commented 8 months ago

Background: I am currently working on a custom environment for the Battle City game using Gymnasium and Stable Baselines v3. My objective is to train an agent using the Proximal Policy Optimization (PPO) algorithm. To enhance the learning process, I've developed a rule-based bot which successfully wins the game.

Issue: The challenge arises with the PPO agent's learning efficacy. Despite incorporating the bot's actions as part of the observation and rewarding the agent for mimicking these actions, the learning outcomes have been suboptimal. My initial impression was that this library would facilitate learning through imitation from my custom bot. However, after reviewing the example codes, it appears that the current setup may necessitate an alternative model for this purpose.

Inquiry: I seek clarification on the library's capabilities in this context:

Does the current implementation only support learning from another model, rather than a custom rule-based bot? If my understanding is correct, and learning from a rule-based bot is not supported, I would like to propose this as a feature request. Implementing the ability to use actions from a custom bot for imitation learning would be a valuable addition to this library. Alternate Request: In case my interpretation is incorrect, and the library does support learning from a bot's actions, I would greatly appreciate a simple example or guidance on how to utilize my bot's actions for training the PPO agent within this framework.

Looking forward to your response and guidance on this matter.

aPovidlo commented 5 months ago

@vladyskai Do you find the solution of your problem? Faced with a similar problem.

vladyskai commented 5 months ago

@vladyskai Do you find the solution of your problem? Faced with a similar problem.

Not really, I finally got the RL agent to win the first 3 levels, but then I gave up on the project. You can see the github page if it's helpful: https://github.com/danisotelo/RL_battle_city

Basically I added all the data I could to the agent and gave it 2 days on my 3060 RTX. It learned, but sloooowly.

HumanCompatibleAI / imitation

Integration of Rule-Based Bot Actions for Imitation Learning #835