Added E3B and validated - SuperMarioBros environment - Fixed Pretraining Mode

Description

I have implemented the E3B intrinsic reward proposed here. I have added the SuperMarioBros environment, which I have used to validate the E3B implementation. I have also fixed the pretraining mode for on-policy agents:

Before: the intrinsic rewards are only added to the extrinsic returns and advantages. Now: if on pretraining mode, compute the intrinsic returns and intrinsic advantages. If using intrinsic + extrinsic rewards, do as before.

This has significantly increased the performance of intrinsic reward algorithms in pre-training mode.

This is the performance of PPO+E3B during pretraining mode in the SuperMarioBros-1-1-v3 environment (i.e. without access to task rewards!)

Motivation and Context

1) E3B is a recent algorithm that achieves SOTA results in complex environments, so it's a valuable contribution. 2) During the pretraining phase, the intrinsic rewards were not being optimized properly 3) Added the SuperMarioBros environment because it is cool and helps evaluating the performance of exploration algorithms since in Mario, good exploratory agents achieve high task rewards.

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[ ] Documentation (update in the documentation)

Checklist

[x] I've read the CONTRIBUTION guide (required)
[ ] I have updated the changelog accordingly (required).
[ ] My change requires a change to the documentation.
[ ] I have updated the tests accordingly (required for a bug fix or a new feature).
[ ] I have updated the documentation accordingly.
[ ] I have opened an associated PR on the rllte-hub repository (if necessary)
[ ] I have reformatted the code using make format (required)
[ ] I have checked the codestyle using make check-codestyle and make lint (required)
[ ] I have ensured make pytest and make type both pass. (required)
[ ] I have checked that the documentation builds using make doc (required)

RLE-Foundation / rllte