Closed matteobettini closed 1 month ago
Here is an overview:
BenchMARL will be looking at the global done (always assumed to be set), which can usually be computed using any or all over the single agents dones.
done
any
all
In all cases the global done is what is used to compute the episode reward.
We log episode_reward min, mean, max over episodes at three different levels:
episode_reward
When agents are done and the global done is not set, agents should be getting a reward of 0 (if you are not using global rewards)
Fixes #135
This PR fixes the way episode rewards are computed in BenchMARL
Here is an overview:
BenchMARL will be looking at the global
done
(always assumed to be set), which can usually be computed usingany
orall
over the single agents dones.In all cases the global done is what is used to compute the episode reward.
We log
episode_reward
min, mean, max over episodes at three different levels:Requiremment
When agents are done and the global done is not set, agents should be getting a reward of 0 (if you are not using global rewards)
Fixes #135