DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.38k stars 1.61k forks source link

Why does VecFrameStack clear the prior frames in the stack for the step when "terminated=True"? #1883

Closed wkwan closed 2 months ago

wkwan commented 3 months ago

❓ Question

I'm currently using VecFrameStack with my custom gym environment, which is a 1v1 game.

To debug training, I'm saving the frames in the stack whenever one player kills another (which I know is the case when reward > 9000):

class VecFrameStackSaveOnKill(VecFrameStack):

    def __init__(self, venv, n_stack, starting_timestep=0):
        super().__init__(venv, n_stack)
        self.cur_step = starting_timestep
        self.n_stack = n_stack

    def step_wait(self):
        self.stackedobs, rewards, dones, infos = super().step_wait()
        if (abs(rewards[0]) > 9000):
            for i in range(self.n_stack):
                Image.fromarray(self.stackedobs[0,:,:,i*3:i*3+3]).save(f"{args.checkpoint_folder}/img_player_killed_opponent_stacked/step_{self.cur_step}_{i}_player_killed_opponent.png")
        self.cur_step += 1
        return self.stackedobs, rewards, dones, infos

What I discovered is that in my custom environment, if I set terminated=True when 1 players kills another, the frame stack that gets saved is 3 black frames, followed by the terminal frame (n_stack is 4). I'm confused by this behavior, because I would expect that the terminal frame stack still needs the 3 prior frames, and that the stack would be cleared in the next step afterwards. Why does the terminal frame stack only include 1 frame?

I tested setting terminated=False when 1 player kills another, and in this case, the frame stack is saving all 4 frames when the reward > 9000. But I'm not sure if this how the API is intended to be used, if I want the episode to end when 1 player kills another.

In case it's helpful, I'm training with RecurrentPPO.

Checklist

wkwan commented 3 months ago

@araffin sorry what's missing in the checklist, did you want to see my custom environment as well? It's a Fortnite custom game, requires some manual navigation within the game to set things up properly, but here's my env code: https://github.com/wkwan/ScrimBrain/blob/master/fortnite_env.py

araffin commented 3 months ago

sorry what's missing in the checklist, did you want to see my custom environment as well?

"If code there is, it is minimal and working", please have look at the linked issue for what minimal and working exactly means: https://github.com/DLR-RM/stable-baselines3/issues/982#issuecomment-1197044014