performance on health_gather_superme

RozenAstrayChen commented 5 years ago

Hello, I train your code on health_gather_superme(master branch) which after 7400 episode but it still doesn't work. Should I train more episode? And I found the rnn_dev branch. The nework is A3C + LSTM in dead_corridor dir , I think A3C+LSTM is more better than current network because agent can memory what is health pack

GoingMyWay commented 5 years ago

How long did you take to train the model? Did you trace the result on Tensorboard? May you should train more episodes.

As for the dead_corridor environment, you should do more reward engineering to train a good agent.

RozenAstrayChen commented 5 years ago

I probably spent a day training health_gather_superme but I switched to training Battle.

I change your code which action combine is complex, I simple this to 8 actions and here is result after training 5000 episodes( 1 days using 1080 )

After 5000 episodes I thinks is great but the Agent hasn't accurate shot enemy, I still try improve

https://www.youtube.com/watch?v=tR6wDjicGYU&t=9s

GoingMyWay commented 5 years ago

Good, the agent can learn to explore the env. Did you change the scenario because the health packs shown in the video are different from in my code?

If there are health packs, ammo and enemies shown in the picture, I think the agent is quite confused on what to, shooting an enemy or picking health packs or ammo.

RozenAstrayChen commented 5 years ago

Yes I remove speed button and the scenario

        game = DoomGame()
        game.load_config(cfg.SCENARIO_PATH)
        game.set_doom_map("map01")
        game.set_screen_resolution(ScreenResolution.RES_640X480)
        game.set_screen_format(ScreenFormat.RGB24)
        game.set_render_hud(False)
        game.set_render_crosshair(False)
        game.set_render_weapon(True)
        game.set_render_decals(False)
        game.set_render_particles(True)
        # Enables labeling of the in game objects.
        game.set_labels_buffer_enabled(True)
        game.add_available_button(Button.MOVE_FORWARD)
        game.add_available_button(Button.MOVE_RIGHT)
        game.add_available_button(Button.MOVE_LEFT)
        game.add_available_button(Button.TURN_LEFT)
        game.add_available_button(Button.TURN_RIGHT)
        game.add_available_button(Button.ATTACK)
        #game.add_available_button(Button.SPEED)
        game.add_available_game_variable(GameVariable.AMMO2)
        game.add_available_game_variable(GameVariable.HEALTH)
        game.add_available_game_variable(GameVariable.USER2)
        game.set_episode_timeout(2100)
        game.set_episode_start_time(5)
        game.set_window_visible(self.play)
        game.set_sound_enabled(False)
        game.set_living_reward(0)
        game.set_mode(Mode.PLAYER)
        if self.play:
            game.add_game_args("+viz_render_all 1")
            game.set_render_hud(False)
            game.set_ticrate(35)
        game.init()
        self.env = game
        self.actions = cfg.button_combinations()

I think health packs, ammo and enemies choice is confirm choose shot enemy, because in reward shaping:

    def reward_function(self):
        kills_delta = self.env.get_game_variable(GameVariable.USER2) - self.last_total_kills
        self.last_total_kills = self.env.get_game_variable(GameVariable.USER2)

        ammo_delta = self.env.get_game_variable(GameVariable.AMMO2) - self.last_total_ammos
        self.last_total_ammos = self.env.get_game_variable(GameVariable.AMMO2)

        health_delta = self.env.get_game_variable(GameVariable.HEALTH) - self.last_total_health
        self.last_total_health = self.env.get_game_variable(GameVariable.HEALTH)

        reward = 0
        reward += kills_delta * 20.

        if ammo_delta >= 0:
            reward += ammo_delta * 0.5
        # else:
        #     reward += ammo_delta * 0.1

        if health_delta >= 0:
            reward += health_delta * 0.5
        else:
            reward += health_delta * 0.1

        if self.env.get_game_variable(GameVariable.HEALTH) <= 30:
            reward += -1

        if self.env.get_game_variable(GameVariable.AMMO2) <= 5:
            reward += -1

        return reward

look like kill is the best choice, kill one enemy has 20 scores ,second is health the last is armmo .

In this video 1:10 the map show enemy and health package, agent choose shot enemy ( I'm not sure distance has influence)

RozenAstrayChen commented 5 years ago

Change Action Combine because In your code button_combinations has 72 action comb

def button_combinations():
    actions = np.identity(8, dtype=int).tolist()
    return actions

GoingMyWay commented 5 years ago

The performance is good now. Maybe you can try to make it act like humans.

RozenAstrayChen commented 5 years ago

My professor and I want to slove like my way home which target is random on map and always got negative rewards.

I notice your images input using stack 4 frame, so I change to Recurent Neural Network. but it dosen't work on my way home, after 1000 episodes , I can't see any learning （like don't hit the wall)

Do you have any solution on this scenario?

Here is my source code

GoingMyWay commented 5 years ago

My professor and I want to slove like my way home which target is random on map and always got negative rewards.

I notice your images input using stack 4 frame, so I change to Recurent Neural Network. but it dosen't work on my way home, after 1000 episodes , I can't see any learning （like don't hit the wall)

Do you have any solution on this scenario?

Here is my source code

Sorry for late reply. I do not know the reward for each step. Does the agent get a -1 reward for each step and a large positve reward for finding its home?

Since the scenario of my way home is a maze, you may encourage the agent to explore more. Here is some vedios you can refer to

And there is a new idea on couriosity, I did not test it, but since it is a bit new and your plan is to publish a paper on a conference, maybe you can try it https://arxiv.org/pdf/1705.05363.pdf

Note that exploration and reward shaping are vital for tackling this scenario.

RozenAstrayChen commented 5 years ago

Thanks for your help!

In my way home every step will get -0.001 ,when find target get +1. Should I give larger than current reward ?

GoingMyWay commented 5 years ago

Thanks for your help!

In my way home every step will get -0.001 ,when find target get +1. Should I give larger than current reward ?

Since the task of the agent is to find the home and the scenario is a maze, I think giving a reward is not helpful due to the backpropagation nature of Bellman update. You can try the paper on curiosity one, that paper contains experiments on my way home.

RozenAstrayChen commented 5 years ago

OK, I will try .

thanks you!!

GoingMyWay / ViZDoomAgents

performance on health_gather_superme #5