Added PPO+LSTM, plus training example

Description

I have added a new agent -- PPO + LSTM, together with the new EpisodicRolloutBuffer, which is similar to VanillaRolloutBuffer but samples entire trajectories instead of random transitions in order to train the LSTM appropriately.

I have also added an example notebook to train it on Atari - Space Invaders, which achieves the following results: ppo_lstm_atari

In this case, it performs very similarly to vanilla PPO:

ppo_atari

Motivation and Context

PPO LSTM can achieve better performance than PPO in partially observed environments.

[x] I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Documentation (update in the documentation)

Checklist

[x] I've read the CONTRIBUTION guide (required)
[ ] I have updated the changelog accordingly (required).
[ ] My change requires a change to the documentation.
[ ] I have updated the tests accordingly (required for a bug fix or a new feature).
[ ] I have updated the documentation accordingly.
[ ] I have opened an associated PR on the rllte-hub repository (if necessary)
[x] I have reformatted the code using make format (required)
[x] I have checked the codestyle using make check-codestyle and make lint (required)
[] I have ensured make pytest and make type both pass. (required)
[ ] I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

RLE-Foundation / rllte