Stable-Baselines-Team / stable-baselines3-contrib

Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code
https://sb3-contrib.readthedocs.io
MIT License
465 stars 173 forks source link

[Question] Simple way to implement data augmentation when training agent #231

Closed thomashirtz closed 7 months ago

thomashirtz commented 7 months ago

❓ Question

Hi everyone,

I'm currently working with the Soft Actor-Critic (SAC) algorithm in Stable Baselines 3 to train an agent in an environment where both the action and observation spaces are represented as 2D grids of the same dimensions. My goal is to improve the agent's learning efficiency and generalization by incorporating data augmentation techniques, specifically rotations (90, 180, 270 degrees) and symmetries (flips), into the training process (I would need to apply the same transformation to the observation and the action). Importantly, the rewards and termination signals (done flags) are not affected by these transformations.

I'm contemplating implementing this augmentation within the ReplayBuffer class, particularly by modifying the _get_samples method to apply these transformations randomly to the data being fetched for training (stable_baselines3.common.buffers.ReplayBuffer._get_samples). Before proceeding, I would like to gather insights on the following:

Any advice, references to relevant documentation, or examples of similar implementations would be greatly appreciated.

Thank you in advance for your help!

Checklist

araffin commented 7 months ago

within the ReplayBuffer class, particularly by modifying the _get_samples method to apply these transformations randomly to

Sounds like a reasonable idea, similar to what we do for the HER replay buffer when creating virtual transitions.

without compromising the integrity of the learning process?

what compromises the integrity of the learning process? you mean you are making the task harder by using randomization?

thomashirtz commented 7 months ago

By integrity just meaning that it doesn't break the whole SB pipeline and some things just don't work anymore. But I guess there is no reason to

Thanks for the feedback!