Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
16.98k stars 4.14k forks source link

Help on hyperparameters and setup #1271

Closed SestoAle closed 5 years ago

SestoAle commented 6 years ago

Hi,

I need a little help on understanding my problem. I'm making a little turn based rogue like game, but I want to proceed in small step in order to understand everything about DRL.

My game setup is this:

The game is made with Unity but I use the PPO algorithm of the tensorforce framework, I don't use the Unity algorithm because I need to use a CNN for my experiments and various type of algorithms.

In this moment I use these hyperparamters:

With this configuration the agent don't learn anything, is like a random agent. If i set the goal position to static (it spawns at the same position every episodes), the agent learn very quickly.

I tried also other types of net configurations, like only 2 Dense layers with different hidden units per layer (32, 64, 128 at each layer), and a lot of different hyperparameters, but the agent never learns.

I tried also the PPO algorithm of Unity but with the same results.

I think that the agent should be able to learn this simple objective, and I am really frustrated by now, so I need some helps: in your experiences, is there anything really wrong with my set-up?

Thank you

xiaomaogy commented 6 years ago

Hi @SestoAle, maybe try to limit the goal position space (like limit it to only 4 four possible positions instead of being static), and see if it can still learn.

SestoAle commented 5 years ago

So, I tried to limit the goal and the agent can't learn. I thought that the observation wasn't well defined (the 8x8 int matrix) and so I added an Embedding layer with size 3.

The new network configuration is:

Embedding layer, size 3, indices 4 Conv 5x5, stride 1, size 32, relu Conv 5x5, stride 1, size 32, relu Conv 3x3, stride 2, size 32, relu Conv 3x3, stride 2, size 32, relu FC size 256 FC size 256

With this net the agent learns to reach the goal in about 1M steps. Now I'm trying to add complexity to the game: at every step the goal moves randomly of one position and the block tiles of the map are created randomly. with this setup the agent still learns, but very slowly. So my questions are:

1) Do you think that this net configuration can work well or it can it be improved? 2) The observation representation (the 8x8 int matrix) is the best choice to represent the environment? 3) Since my game is completely discrete (with discrete actions and on demand decisions) do you think that a DQN algorithm can work better with respect of PPO?

Thanks!

xiaomaogy commented 5 years ago

@SestoAle I guess you will need to figure these answers yourself since this is something that depends on a lot of other factors.

xiaomaogy commented 5 years ago

Thanks for reaching out to us. Hopefully you were able to resolve your issue. We are closing this due to inactivity, but if you need additional assistance, feel free to reopen the issue.

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.