[NOT SOLVED] Problems with non-moving agents

mateolopezareal commented 4 years ago

Sorry I did not want to have as a bug I am making a project that consists in a matrix in which each cube is an agent. The cubes do not move, they just have to change their state (Material), so if they are dead, they are white; and if they are alive they can be 3 races, black, green or red. So the goal is to obtain a matrix with the three races alive and no dead. The agent has to learn how to survive in the environment. Thats it. But I do not want just a +1 if alive, -1 if dead; I want to see interactions between races. The main goal to achieve is to see that by setting some parameters to each race, how each race interacts with the environment. So right now, the state will go more to a "Learn how to survive in the environment, by grouping all together (but not letting only one race alive". (that is what I am implementing with the rewards I think).

To do this I give a reward of -10 if the agent is dead, and a bunch of more things if it is alive (the reward depends of this depends if the cubes around a cube are of the same colour, so if the cube has turn into red but has 8 greens around it it will get a punishment for example). I am doing this to try to get together as much of the same race as possible. I will like to have three typer os rewards, the one of the agent, that makes it to not be dead and to gather cubes with the same race; the one of the environment that looks to have as much distribution of races as possible (I mean NO only one race alive; and the one of the race, that has its personal politics (this one is not implemented yet, and I am in doubt about it).

The vector action is discrete. I think that for this project it is more simple to use the discrete Space Type, right? So, for the observations I give to the agent the color of the neighbours in numbers (dead=0, black=1, green=2 , red=3), the neighbours are the cubes above, on the left, on the right, etc so they are 8 observations plus the color of the agent, so 9. Taking all this into account the agents learn to have all the same colour to maximise the reward. To change this I am going to give as a reward the entropy of all the cubes. By doing this the agent will look to maximise also the entropy, so to have as much races alive as possible. But the problem is, what do I give as an observation for this?

I am training by adding the histogram entropy as a reward, and there is something that there is something that bothers me. In the training there is a point where it finds a maximum reward but then as the training continues the reward decreases and then stays at that value. The maximum value is when it makes the entropy almost 1, so there are all the races alive and with almost the same amount. But then even though it has less reward it learns to be all of the same colour and it stucks with that. Screenshot 2020-06-05 at 09 27 51

Also, I have some questions about ml-agents:

It is possible to do a project of this type, because all the project that I have seen involved movement?
I am using the fixed update so, all the cubes has the same entropy, because if I calculate the entropy in the agent action, the entropy changes, is it a good idea?
I am trying also to add a reward that represents a race, any ideas?
If I give as observation the color (the number) all the cubes in the matrix, the vector observation size is 153, is this a problem? Do I have to change something because of this big observation?
The entropy graph of tensorboard is dropping quite fast but the value is already 0.01, any problem to put something like 0.1 or 0.5?

I know It is too much info, so I will be waiting for your response. If you need the code I will upload anything you want, it is just too large to upload it now.

andrewcoh commented 4 years ago

Hi @mateolopezareal

I believe this is more appropriate for a forum post since this is not a bug in the repo. I encourage you to post this there (https://forum.unity.com/forums/ml-agents.453/) with a short description of what the goal of an agent is as well as how you build the observation and action space and reward function.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 28 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically closed because it has not had activity in the last 42 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Unity-Technologies / ml-agents

[NOT SOLVED] Problems with non-moving agents #4080