Odd DQN agent behavior with simple custom env

Good day, I am experiencing add issue while testing a simple custom environment with default DQN agent. I have state space PlanarMapsObservationSpace(shape=np.array([50, 50, 1]), low=0, high=1) and discrete action space with 4 actions (move up/down/left/right). At one random point on the map is spawned "target" (value 1) and elsewhere is "nothing" (value 0). The goal of the agent is to move the target to the middle of the map. A reward is a negative manhattan distance between the target and middle of the map. The issue is that from the first evaluation phase agent chose one direction and blindly move the target in this direction.

My code here: Environment:

self.state_space = StateSpace({"observation": PlanarMapsObservationSpace(shape=np.array([50, 50, 1]), low=0, high=1)})
self.action_space = DiscreteActionSpace(num_actions=4, descriptions={"0": "up", "1": "down", "2": "left", "3": "right"})

Take_action:

self.env.observation[self.env.target['x']][self.env.target['y']][0] = 0
  if action_idx == 0:
    self.env.target['y'] += 1
  if action_idx == 1:
    self.env.target['y'] -= 1
  if action_idx == 2:
    self.env.target['x'] -= 1
  if action_idx == 3:
    self.env.target['x'] += 1
self.env.observation[self.env.target['x']][self.env.target['y']] = 1

Update_state:

reward = (np.abs(24 - self.env.target['x']) + np.abs(24 - self.env.target['y']))
self.done = (self.env.target['x'] == 24 and self.env.target['y'] == 24) or self.env.target['x'] <= 0 or self.env.target['y'] <= 0 or self.env.target['x'] >= 49 or self.env.target['y'] >= 49 or self.current_episode_steps_counter > 100
self.state = {"observation": self.env.observation}

Restart_environment:

self.env.target = {'x': np.random.randint(0, 49), 'y': np.random.randint(0, 49)}
self.env.observation = np.zeros((50, 50, 1), dtype=int)
self.env.observation[self.env.target['x']][self.env.target['y']][0] = 1

Preset:

agent_params = DQNAgentParameters()
schedule_params = SimpleSchedule()
env_params = MyCustomEnvironmentParameters()
graph_manager = BasicRLGraphManager(agent_params=agent_params, env_params=env_params, schedule_params=schedule_params, vis_params=vis_params, preset_validation_params=PresetValidationParameters())

If anyone has any idea how to solve this please reply. Thanks in advance

IntelLabs / coach

Odd DQN agent behavior with simple custom env #433