[BugFix] More flexible episode_reward computation in logger

This PR fixes the way episode rewards are computed in BenchMARL

Here is an overview:

epispde_reward compuation

BenchMARL will be looking at the global done (always assumed to be set), which can usually be computed using any or all over the single agents dones.

In all cases the global done is what is used to compute the episode reward.

We log episode_reward min, mean, max over episodes at three different levels:

agent (disabled by default, can be turned on manually)
group averaged over agents in group
global averaged over agents in groups and gropus

Requiremment

When agents are done and the global done is not set, agents should be getting a reward of 0 (if you are not using global rewards)

Fixes #135

facebookresearch / BenchMARL

[BugFix] More flexible episode_reward computation in logger #136

This PR fixes the way episode rewards are computed in BenchMARL

Requiremment