Closed MrGitGo closed 5 years ago
there are a lot of hyperparameters including gamma, lambda, learningrate, number of layers,... maybe by adjusting these it will work. also you could try training longer for days...
reinforcement learning can be frustrating business I guess!
I think you should 1) Make the scene simpler by changing it to 1 vs 1 instead of 5 vs 5, then iterate from there. 2) Use the default hyperparameter first before trying your own.
@MrGitGo
Ok, I will try this out. But what is the point of trying it with 1 vs 1 first. How does it influence the training? I thought it would be better if I have more agents. @xiaomaogy
No, more agent will make the training more complicated and harder to debug. You should start with the simplest possible scene that should absolutely work, then iterate from there. @MrGitGo
So I put the agents incrementally in the game (1st phase: 1 vs 1, 2nd 2 vs 2, ..). I did this until 3 vs. 3 and I was at step 9 million. After a while I made the observation, that I am getting mean rewards: 0.0 and std rewards: 0.0. And that the whole time. When I changed the spawnpoints I had again sometimes positive rewards and sometimes negative. I realised when the agents avoid each other and don't kill anyone they get the 0.0 reward. Maybe I am doing it wrong with the 3 vs. 3 agents.
My main goal is, after I successfully trained the agents, to play with one character vs. 3 other agents. So it is 1 human vs. 3 agents. My question is, is it correct to train like that? Should I put 3 agents and one character that is randomly moving and shooting? I could put the agent also one after another.
The reason why I have 3 vs 3 agents is, that I thought, that it would be better for the training. The more agents I have the better and faster they would train. But I failed at this I guess. So how and what should I do next to reach my goal? The last thing my agents were doing was to avoid each other until I changed the spawnpoints. But I guess after a while they will avoid each other again. I guess it is because of the rewards: when one agent is killing an other, the one will get positive reward and the other negative. And both are connected to one brain. So in order to get no negative reward, they try to avoid each other.
Edit: Everytime I increse the number of the agents, it seems that the training looks ok, but you see yourself here. I have a one minute video of the game and some pics:
VIdeo: https://www.dropbox.com/s/fofsxe7lj34kk4w/issues.mp4?dl=0
Tensorboard:
And I am using the RayPerception script, but it seems that it is not fast enough for my training. Look here:
The lines are normally attached to the player (0,0,0), but if they move wildly, this happens.
And One more thing to ask: I see on my tensorboard, that the entropy is not decreasing fast enough. Can I decrease the value of beta and continue with my training? Or do I have to train from the beginning, when I change a parameter? Should I change a parameter, when yes which one, to which value?
This is the link to my .yaml: https://www.dropbox.com/s/4src7kt2ykwfqu2/trainer_config.yaml?dl=0
I am using the default
This is my CollectObservation and AgentAction method:
https://www.dropbox.com/s/whodh1v1l1eb05e/issues.txt?dl=0
@xiaomaogy
I don't think you should give -0.3 if there is an opponent still alive? Because it looks like the one agent will get -0.2 and the other agent will get 0.2 so that the rewards are equal and the system will find a solution that they avoid each other. So I think the reward structure should be modified so its more reward for the hunter.
@MrGitGo A few things you can try.
@xiaomaogy
I guess I have to restart my model and begin from step 0 again
Ok, I will start with as many 1 vs 1 secenes as possible, but how can I iterate the agents then, when they are already active in the scene?
EDIT: I have removed all negative rewards. And The new training environment looks like this: https://www.dropbox.com/s/qf9a51wkdaovuyo/Unity%202018.2.15f1%20Personal%20%2864bit%29%20-%20Shoot.unity%20-%20Bachelor%20-%20PC%2C%20Mac%20%26%20Linux%20Standalone%20_DX11_%2030.05.2019%2013_00_34.mp4?dl=0
What do you mean by "how can I iterate the agents when they are already active in the scene"? I guess you mean how do you reset the agents so that they are not stuck after winning? You just implement the corresponding AgentReset method or the AcademyReset to reset all of the Agents periodically. @MrGitGo
Ok, I am just calling the agent done() method and it works
Thank you for the discussion. We are closing this issue due to inactivity. Feel free to reopen it if you’d like to continue the discussion though.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
-Brain: One Brain with the following action space:
Vector Observation:
Tensorboard:
And I also changed the arena after every million steps:
And the max steps is (in Academy): 2000
num_layers: 1 normalize: true batch_size: 64 buffer_size: 10240 hidden_units: 128
Could you give me hints in solving the problem with the rewards? I have read here somewhere in the issues that I should be very sparsly with negative rewards, but still I want to ask you, too. I hope you can help me