Help on hyperparameters and setup

SestoAle commented 6 years ago

Hi,

I need a little help on understanding my problem. I'm making a little turn based rogue like game, but I want to proceed in small step in order to understand everything about DRL.

My game setup is this:

a grid 8x8 with 1 agent starting at the bottom row
a goal that is spawned in a random position on the map
8 possible moving action, the agent can move in all directions
the observation is a 8x8 matrix of int representing the map (0: empty tile, 1: column, the agent can't move in this tile, 2: position of the enemy, 3: position of the agent)
The reward function is: +1.0 (done) when the agent reaches the goal
Maxsteps: 100

The game is made with Unity but I use the PPO algorithm of the tensorforce framework, I don't use the Unity algorithm because I need to use a CNN for my experiments and various type of algorithms.

In this moment I use these hyperparamters:

Net configuration: 3 layers Conv2D with size 32, window (3,3) and stride 2, 2 Dense FC layers with size 32
Network update: every episode with replay memory of capacity 10000 steps
Learning rate of 1e-4
gamma: 0.99
lambda: 0.97

With this configuration the agent don't learn anything, is like a random agent. If i set the goal position to static (it spawns at the same position every episodes), the agent learn very quickly.

I tried also other types of net configurations, like only 2 Dense layers with different hidden units per layer (32, 64, 128 at each layer), and a lot of different hyperparameters, but the agent never learns.

I tried also the PPO algorithm of Unity but with the same results.

I think that the agent should be able to learn this simple objective, and I am really frustrated by now, so I need some helps: in your experiences, is there anything really wrong with my set-up?

Thank you

xiaomaogy commented 6 years ago

Hi @SestoAle, maybe try to limit the goal position space (like limit it to only 4 four possible positions instead of being static), and see if it can still learn.

SestoAle commented 5 years ago

So, I tried to limit the goal and the agent can't learn. I thought that the observation wasn't well defined (the 8x8 int matrix) and so I added an Embedding layer with size 3.

The new network configuration is:

Embedding layer, size 3, indices 4 Conv 5x5, stride 1, size 32, relu Conv 5x5, stride 1, size 32, relu Conv 3x3, stride 2, size 32, relu Conv 3x3, stride 2, size 32, relu FC size 256 FC size 256

With this net the agent learns to reach the goal in about 1M steps. Now I'm trying to add complexity to the game: at every step the goal moves randomly of one position and the block tiles of the map are created randomly. with this setup the agent still learns, but very slowly. So my questions are:

1) Do you think that this net configuration can work well or it can it be improved? 2) The observation representation (the 8x8 int matrix) is the best choice to represent the environment? 3) Since my game is completely discrete (with discrete actions and on demand decisions) do you think that a DQN algorithm can work better with respect of PPO?

Thanks!

xiaomaogy commented 5 years ago

@SestoAle I guess you will need to figure these answers yourself since this is something that depends on a lot of other factors.

xiaomaogy commented 5 years ago

Thanks for reaching out to us. Hopefully you were able to resolve your issue. We are closing this due to inactivity, but if you need additional assistance, feel free to reopen the issue.

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Unity-Technologies / ml-agents

Help on hyperparameters and setup #1271