RLE-Foundation / rllte

Long-Term Evolution Project of Reinforcement Learning
https://docs.rllte.dev/
MIT License
453 stars 84 forks source link

Added E3B and validated - SuperMarioBros environment - Fixed Pretraining Mode #41

Open roger-creus opened 8 months ago

roger-creus commented 8 months ago

Description

I have implemented the E3B intrinsic reward proposed here. I have added the SuperMarioBros environment, which I have used to validate the E3B implementation. I have also fixed the pretraining mode for on-policy agents:

Before: the intrinsic rewards are only added to the extrinsic returns and advantages. Now: if on pretraining mode, compute the intrinsic returns and intrinsic advantages. If using intrinsic + extrinsic rewards, do as before.

This has significantly increased the performance of intrinsic reward algorithms in pre-training mode.

This is the performance of PPO+E3B during pretraining mode in the SuperMarioBros-1-1-v3 environment (i.e. without access to task rewards!)

image

Motivation and Context

1) E3B is a recent algorithm that achieves SOTA results in complex environments, so it's a valuable contribution. 2) During the pretraining phase, the intrinsic rewards were not being optimized properly 3) Added the SuperMarioBros environment because it is cool and helps evaluating the performance of exploration algorithms since in Mario, good exploratory agents achieve high task rewards.

Types of changes

Checklist