IntelLabs / coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
https://intellabs.github.io/coach/
Apache License 2.0
2.33k stars 461 forks source link

Odd DQN agent behavior with simple custom env #433

Open LukasMadzik opened 4 years ago

LukasMadzik commented 4 years ago

Good day, I am experiencing add issue while testing a simple custom environment with default DQN agent. I have state space PlanarMapsObservationSpace(shape=np.array([50, 50, 1]), low=0, high=1) and discrete action space with 4 actions (move up/down/left/right). At one random point on the map is spawned "target" (value 1) and elsewhere is "nothing" (value 0). The goal of the agent is to move the target to the middle of the map. A reward is a negative manhattan distance between the target and middle of the map. The issue is that from the first evaluation phase agent chose one direction and blindly move the target in this direction.

My code here: Environment:

self.state_space = StateSpace({"observation": PlanarMapsObservationSpace(shape=np.array([50, 50, 1]), low=0, high=1)})
self.action_space = DiscreteActionSpace(num_actions=4, descriptions={"0": "up", "1": "down", "2": "left", "3": "right"})

Take_action:

self.env.observation[self.env.target['x']][self.env.target['y']][0] = 0
  if action_idx == 0:
    self.env.target['y'] += 1
  if action_idx == 1:
    self.env.target['y'] -= 1
  if action_idx == 2:
    self.env.target['x'] -= 1
  if action_idx == 3:
    self.env.target['x'] += 1
self.env.observation[self.env.target['x']][self.env.target['y']] = 1

Update_state:

reward = (np.abs(24 - self.env.target['x']) + np.abs(24 - self.env.target['y']))
self.done = (self.env.target['x'] == 24 and self.env.target['y'] == 24) or self.env.target['x'] <= 0 or self.env.target['y'] <= 0 or self.env.target['x'] >= 49 or self.env.target['y'] >= 49 or self.current_episode_steps_counter > 100
self.state = {"observation": self.env.observation}

Restart_environment:

self.env.target = {'x': np.random.randint(0, 49), 'y': np.random.randint(0, 49)}
self.env.observation = np.zeros((50, 50, 1), dtype=int)
self.env.observation[self.env.target['x']][self.env.target['y']][0] = 1

Preset:

agent_params = DQNAgentParameters()
schedule_params = SimpleSchedule()
env_params = MyCustomEnvironmentParameters()
graph_manager = BasicRLGraphManager(agent_params=agent_params, env_params=env_params, schedule_params=schedule_params, vis_params=vis_params, preset_validation_params=PresetValidationParameters())

If anyone has any idea how to solve this please reply. Thanks in advance