Closed WillDudley closed 1 year ago
hitting L84 means hitting 76 and 78
cur_agent = self.agent_selection
self._cumulative_rewards[cur_agent] = 0
self.rewards = make_defaultdict({a: 0.0 for a in self.agents})
self.skip_num[cur_agent] = self.num_frames
self.old_actions[cur_agent] = action
while self.old_actions[self.env.agent_selection] is not None:
step_agent = self.env.agent_selection
if step_agent in self.env.dones:
# reward = self.env.rewards[step_agent]
# done = self.env.dones[step_agent]
# info = self.env.infos[step_agent]
observe, reward, done, info = self.env.last(observe=False)
action = self.old_actions[step_agent]
self.env.step(action)
for agent in self.env.agents:
self.rewards[agent] += self.env.rewards[agent]
self.infos[self.env.agent_selection] = info
fixed
on master branch (pypi pz) env.step() hits at L84 of frame_skip, once for agent1, 4 times for agent 2. on trunc branch (trunc pz) env_step() doesn't hit anywhere