RLE-Foundation / rllte

Long-Term Evolution Project of Reinforcement Learning
https://docs.rllte.dev/
MIT License
453 stars 84 forks source link

Added PPO+LSTM, plus training example #39

Open roger-creus opened 8 months ago

roger-creus commented 8 months ago

Description

I have added a new agent -- PPO + LSTM, together with the new EpisodicRolloutBuffer, which is similar to VanillaRolloutBuffer but samples entire trajectories instead of random transitions in order to train the LSTM appropriately.

I have also added an example notebook to train it on Atari - Space Invaders, which achieves the following results: ppo_lstm_atari

In this case, it performs very similarly to vanilla PPO:

ppo_atari

Motivation and Context

PPO LSTM can achieve better performance than PPO in partially observed environments.

Types of changes

Checklist

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line