Old sustainability metric was calculating something like the fraction of agent-timesteps with nonzero reward. What we want is S: the average timestep number where an agent receives a nonzero reward.
Intuitively, if agents are gathering apples later and later into the episode, the environment is becoming more sustainable.
Old sustainability metric was calculating something like the fraction of agent-timesteps with nonzero reward. What we want is S: the average timestep number where an agent receives a nonzero reward.
Intuitively, if agents are gathering apples later and later into the episode, the environment is becoming more sustainable.