Open RozenAstrayChen opened 5 years ago
How long did you take to train the model? Did you trace the result on Tensorboard? May you should train more episodes.
As for the dead_corridor environment, you should do more reward engineering to train a good agent.
I probably spent a day training health_gather_superme but I switched to training Battle.
I change your code which action combine is complex, I simple this to 8 actions and here is result after training 5000 episodes( 1 days using 1080 )
After 5000 episodes I thinks is great but the Agent hasn't accurate shot enemy, I still try improve
Good, the agent can learn to explore the env. Did you change the scenario because the health packs shown in the video are different from in my code?
If there are health packs, ammo and enemies shown in the picture, I think the agent is quite confused on what to, shooting an enemy or picking health packs or ammo.
Yes I remove speed button and the scenario
game = DoomGame()
game.load_config(cfg.SCENARIO_PATH)
game.set_doom_map("map01")
game.set_screen_resolution(ScreenResolution.RES_640X480)
game.set_screen_format(ScreenFormat.RGB24)
game.set_render_hud(False)
game.set_render_crosshair(False)
game.set_render_weapon(True)
game.set_render_decals(False)
game.set_render_particles(True)
# Enables labeling of the in game objects.
game.set_labels_buffer_enabled(True)
game.add_available_button(Button.MOVE_FORWARD)
game.add_available_button(Button.MOVE_RIGHT)
game.add_available_button(Button.MOVE_LEFT)
game.add_available_button(Button.TURN_LEFT)
game.add_available_button(Button.TURN_RIGHT)
game.add_available_button(Button.ATTACK)
#game.add_available_button(Button.SPEED)
game.add_available_game_variable(GameVariable.AMMO2)
game.add_available_game_variable(GameVariable.HEALTH)
game.add_available_game_variable(GameVariable.USER2)
game.set_episode_timeout(2100)
game.set_episode_start_time(5)
game.set_window_visible(self.play)
game.set_sound_enabled(False)
game.set_living_reward(0)
game.set_mode(Mode.PLAYER)
if self.play:
game.add_game_args("+viz_render_all 1")
game.set_render_hud(False)
game.set_ticrate(35)
game.init()
self.env = game
self.actions = cfg.button_combinations()
I think health packs, ammo and enemies choice is confirm choose shot enemy, because in reward shaping:
def reward_function(self):
kills_delta = self.env.get_game_variable(GameVariable.USER2) - self.last_total_kills
self.last_total_kills = self.env.get_game_variable(GameVariable.USER2)
ammo_delta = self.env.get_game_variable(GameVariable.AMMO2) - self.last_total_ammos
self.last_total_ammos = self.env.get_game_variable(GameVariable.AMMO2)
health_delta = self.env.get_game_variable(GameVariable.HEALTH) - self.last_total_health
self.last_total_health = self.env.get_game_variable(GameVariable.HEALTH)
reward = 0
reward += kills_delta * 20.
if ammo_delta >= 0:
reward += ammo_delta * 0.5
# else:
# reward += ammo_delta * 0.1
if health_delta >= 0:
reward += health_delta * 0.5
else:
reward += health_delta * 0.1
if self.env.get_game_variable(GameVariable.HEALTH) <= 30:
reward += -1
if self.env.get_game_variable(GameVariable.AMMO2) <= 5:
reward += -1
return reward
look like kill is the best choice, kill one enemy has 20 scores ,second is health the last is armmo .
In this video 1:10 the map show enemy and health package, agent choose shot enemy ( I'm not sure distance has influence)
Change Action Combine because In your code button_combinations has 72 action comb
def button_combinations():
actions = np.identity(8, dtype=int).tolist()
return actions
The performance is good now. Maybe you can try to make it act like humans.
My professor and I want to slove like my way home which target is random on map and always got negative rewards.
I notice your images input using stack 4 frame, so I change to Recurent Neural Network. but it dosen't work on my way home, after 1000 episodes , I can't see any learning (like don't hit the wall)
Do you have any solution on this scenario?
My professor and I want to slove like my way home which target is random on map and always got negative rewards.
I notice your images input using stack 4 frame, so I change to Recurent Neural Network. but it dosen't work on my way home, after 1000 episodes , I can't see any learning (like don't hit the wall)
Do you have any solution on this scenario?
Sorry for late reply. I do not know the reward for each step. Does the agent get a -1 reward for each step and a large positve reward for finding its home?
Since the scenario of my way home is a maze, you may encourage the agent to explore more. Here is some vedios you can refer to
And there is a new idea on couriosity, I did not test it, but since it is a bit new and your plan is to publish a paper on a conference, maybe you can try it https://arxiv.org/pdf/1705.05363.pdf
Note that exploration and reward shaping are vital for tackling this scenario.
Thanks for your help!
In my way home every step will get -0.001 ,when find target get +1. Should I give larger than current reward ?
Thanks for your help!
In my way home every step will get -0.001 ,when find target get +1. Should I give larger than current reward ?
Since the task of the agent is to find the home and the scenario is a maze, I think giving a reward is not helpful due to the backpropagation nature of Bellman update. You can try the paper on curiosity one, that paper contains experiments on my way home.
OK, I will try .
thanks you!!
Hello, I train your code on health_gather_superme(master branch) which after 7400 episode but it still doesn't work. Should I train more episode? And I found the rnn_dev branch. The nework is A3C + LSTM in dead_corridor dir , I think A3C+LSTM is more better than current network because agent can memory what is health pack