Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
16.96k stars 4.13k forks source link

Mean reward is not increasing #2061

Closed MrGitGo closed 5 years ago

MrGitGo commented 5 years ago

issueBild

-Brain: One Brain with the following action space:

Vector Observation:

if (useVectorObs)
        {
            var rayDistance = 8f;
            var rayDistance2 = 30f;
            float[] rayAngles = { 0f, 90f, 135f, 180f, 70f };
            float[] rayAngles2 = {45f, 90f,110f};
            var detectableObjects = new[] { "agent", "wall","counterAgent" };
            Vector3 forward = transform.forward;
            forward.y = 0;

            AddVectorObs(Quaternion.LookRotation(forward).eulerAngles.y);

            AddVectorObs(rayPer.Perceive(rayDistance2, rayAngles2, detectableObjects, 2.8f, 0f));

            AddVectorObs(rayPer.Perceive(rayDistance, rayAngles, detectableObjects, 1.5f, 0f));

            AddVectorObs(gameObject.GetComponent<Rigidbody>().velocity);
        }

Tensorboard:

tensorboard4mio

And I also changed the arena after every million steps:

progressTo4Mio

And the max steps is (in Academy): 2000
num_layers: 1 normalize: true batch_size: 64 buffer_size: 10240 hidden_units: 128

Could you give me hints in solving the problem with the rewards? I have read here somewhere in the issues that I should be very sparsly with negative rewards, but still I want to ask you, too. I hope you can help me

mattinjersey commented 5 years ago

there are a lot of hyperparameters including gamma, lambda, learningrate, number of layers,... maybe by adjusting these it will work. also you could try training longer for days...

reinforcement learning can be frustrating business I guess!

xiaomaogy commented 5 years ago

I think you should 1) Make the scene simpler by changing it to 1 vs 1 instead of 5 vs 5, then iterate from there. 2) Use the default hyperparameter first before trying your own.

@MrGitGo

MrGitGo commented 5 years ago

Ok, I will try this out. But what is the point of trying it with 1 vs 1 first. How does it influence the training? I thought it would be better if I have more agents. @xiaomaogy

xiaomaogy commented 5 years ago

No, more agent will make the training more complicated and harder to debug. You should start with the simplest possible scene that should absolutely work, then iterate from there. @MrGitGo

MrGitGo commented 5 years ago

So I put the agents incrementally in the game (1st phase: 1 vs 1, 2nd 2 vs 2, ..). I did this until 3 vs. 3 and I was at step 9 million. After a while I made the observation, that I am getting mean rewards: 0.0 and std rewards: 0.0. And that the whole time. When I changed the spawnpoints I had again sometimes positive rewards and sometimes negative. I realised when the agents avoid each other and don't kill anyone they get the 0.0 reward. Maybe I am doing it wrong with the 3 vs. 3 agents.

My main goal is, after I successfully trained the agents, to play with one character vs. 3 other agents. So it is 1 human vs. 3 agents. My question is, is it correct to train like that? Should I put 3 agents and one character that is randomly moving and shooting? I could put the agent also one after another.

The reason why I have 3 vs 3 agents is, that I thought, that it would be better for the training. The more agents I have the better and faster they would train. But I failed at this I guess. So how and what should I do next to reach my goal? The last thing my agents were doing was to avoid each other until I changed the spawnpoints. But I guess after a while they will avoid each other again. I guess it is because of the rewards: when one agent is killing an other, the one will get positive reward and the other negative. And both are connected to one brain. So in order to get no negative reward, they try to avoid each other.

Edit: Everytime I increse the number of the agents, it seems that the training looks ok, but you see yourself here. I have a one minute video of the game and some pics:

VIdeo: https://www.dropbox.com/s/fofsxe7lj34kk4w/issues.mp4?dl=0

Tensorboard:

issues2

And I am using the RayPerception script, but it seems that it is not fast enough for my training. Look here:

issues

The lines are normally attached to the player (0,0,0), but if they move wildly, this happens.

And One more thing to ask: I see on my tensorboard, that the entropy is not decreasing fast enough. Can I decrease the value of beta and continue with my training? Or do I have to train from the beginning, when I change a parameter? Should I change a parameter, when yes which one, to which value?

This is the link to my .yaml: https://www.dropbox.com/s/4src7kt2ykwfqu2/trainer_config.yaml?dl=0

I am using the default

This is my CollectObservation and AgentAction method:

https://www.dropbox.com/s/whodh1v1l1eb05e/issues.txt?dl=0

@xiaomaogy

mattinjersey commented 5 years ago

I don't think you should give -0.3 if there is an opponent still alive? Because it looks like the one agent will get -0.2 and the other agent will get 0.2 so that the rewards are equal and the system will find a solution that they avoid each other. So I think the reward structure should be modified so its more reward for the hunter.

xiaomaogy commented 5 years ago

@MrGitGo A few things you can try.

  1. Remove the negative reward for dying. The positive reward should already be enough. Since dying will avoid the agent get more reward, the agent will learn to not die. (you can revive the agent as soon as it dies)
  2. Make your training scene much smaller. (like a really small room). After your agent learns to kill each other, you can iterate on that model (manually use curriculum learning)
  3. Start with 1 vs 1 agent.
  4. Have as many 1 vs 1 training scene as possible to speed up the training.
MrGitGo commented 5 years ago

@xiaomaogy

  1. I guess I have to restart my model and begin from step 0 again

  2. Ok, I will start with as many 1 vs 1 secenes as possible, but how can I iterate the agents then, when they are already active in the scene?

EDIT: I have removed all negative rewards. And The new training environment looks like this: https://www.dropbox.com/s/qf9a51wkdaovuyo/Unity%202018.2.15f1%20Personal%20%2864bit%29%20-%20Shoot.unity%20-%20Bachelor%20-%20PC%2C%20Mac%20%26%20Linux%20Standalone%20_DX11_%2030.05.2019%2013_00_34.mp4?dl=0

xiaomaogy commented 5 years ago

What do you mean by "how can I iterate the agents when they are already active in the scene"? I guess you mean how do you reset the agents so that they are not stuck after winning? You just implement the corresponding AgentReset method or the AcademyReset to reset all of the Agents periodically. @MrGitGo

MrGitGo commented 5 years ago

Ok, I am just calling the agent done() method and it works

shihzy commented 5 years ago

Thank you for the discussion. We are closing this issue due to inactivity. Feel free to reopen it if you’d like to continue the discussion though.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.