[Question] Simple way to implement data augmentation when training agent

thomashirtz commented 7 months ago

❓ Question

Hi everyone,

I'm currently working with the Soft Actor-Critic (SAC) algorithm in Stable Baselines 3 to train an agent in an environment where both the action and observation spaces are represented as 2D grids of the same dimensions. My goal is to improve the agent's learning efficiency and generalization by incorporating data augmentation techniques, specifically rotations (90, 180, 270 degrees) and symmetries (flips), into the training process (I would need to apply the same transformation to the observation and the action). Importantly, the rewards and termination signals (done flags) are not affected by these transformations.

I'm contemplating implementing this augmentation within the ReplayBuffer class, particularly by modifying the _get_samples method to apply these transformations randomly to the data being fetched for training (stable_baselines3.common.buffers.ReplayBuffer._get_samples). Before proceeding, I would like to gather insights on the following:

Is modifying the ReplayBuffer class to incorporate data augmentation a recommended approach within the Stable Baselines 3 framework, or are there existing functionalities or best practices for achieving this?
Are there alternative or more efficient methods to implement such data augmentations without compromising the integrity of the learning process?

Any advice, references to relevant documentation, or examples of similar implementations would be greatly appreciated.

Thank you in advance for your help!

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin commented 7 months ago

within the ReplayBuffer class, particularly by modifying the _get_samples method to apply these transformations randomly to

Sounds like a reasonable idea, similar to what we do for the HER replay buffer when creating virtual transitions.

without compromising the integrity of the learning process?

what compromises the integrity of the learning process? you mean you are making the task harder by using randomization?

thomashirtz commented 7 months ago

By integrity just meaning that it doesn't break the whole SB pipeline and some things just don't work anymore. But I guess there is no reason to

Thanks for the feedback!

Stable-Baselines-Team / stable-baselines3-contrib

[Question] Simple way to implement data augmentation when training agent #231

❓ Question

Checklist