This PR also logs discounted returns. This metric is useful because the agent actually optimizes for chunks of discounted returns (chunks defined by num_envs * num_steps). I also logged the episodic length as well and change the metrics name from episode_reward to episodic_return to more accurately reflect the metric.
This PR also logs discounted returns. This metric is useful because the agent actually optimizes for chunks of discounted returns (chunks defined by
num_envs * num_steps
). I also logged the episodic length as well and change the metrics name fromepisode_reward
toepisodic_return
to more accurately reflect the metric.