Teach agents to backtrack

Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

https://unity.com/products/machine-learning-agents

Other

17.05k stars 4.15k forks source link

Teach agents to backtrack #1042

Closed jpax1996 closed 6 years ago

jpax1996 commented 6 years ago

I'm having issues teaching my agents to go back in order to progress. Since they start off by learning that if you move forward you will get a reward, they never try to go back ,as you can see in this video: https://drive.google.com/open?id=1WQbIaHZ6_FtATPlfLALCnk3AQjKbt4L7

I put a tile to black the passage under the platform so that the ai doesn't fall on the spikes, i felt like that could help him learn better. Do you guys have any suggestions on what i should try?

Thanks for the help

vincentpierre commented 6 years ago

Maybe you should not reward the AI for going forward but reward it for passing certain checkpoints. With this your AI should not try to constantly move forward. You can activate curiosity if you feel your reward signal is too sparse. This will force your agent to explore more. According to this it is even possible to train a game like Super Mario using only curiosity (no reward at all).

jpax1996 commented 6 years ago

@vincentpierre Thank you for the response, I have made some checkpoints but the result is similar, how would i go about activating curiosity?

vincentpierre commented 6 years ago

In the training_config.yaml you can add use_curiosity: true. Refer to the Pyramids environment that uses curiosity. I hope this helps.

jpax1996 commented 6 years ago

That helps a lot, thank you If I use curiosity, how does that effect the reward system that I have put in place? Will the agent still use the rewards?

vincentpierre commented 6 years ago

Here is how to setup curiosity and here is a blog post on how it works. The agent will still use your reward system but it will also use a set of rewards that it generates itself.

jpax1996 commented 6 years ago

Thanks man, I will try to implement it and let you know how it works out

jpax1996 commented 6 years ago

@vincentpierre So i have set up the curiosity and everything seems to be running, but i am still not sure when to reward the agent Right now i am rewarding +1 when the agent reaches one of the check points in the level (red squares in the editor). When the agent falls in the spikes,at the moment, i don't penalize him. The curiosity is set to true, curiosity_strength: 0.01 and curiosity_enc_size: 128. Am i doing this right? should i play around with the curiosity values until the agent solves the level?

vincentpierre commented 6 years ago

It's hard to tell if it will learn but I think it should. Does the cumulative reward increase ? How long do you train it for ?

jpax1996 commented 6 years ago

Here's a video showing a bit of the training: https://drive.google.com/open?id=1-SaAU7sUWIrk5M9XgmpMisAJJFmLMiAp

I train it with a max step of 100e5 but they quickly get stuck on a loop (some sort of local maximum), where they kill themselves on the first set of spikes after recieving the first reward: https://drive.google.com/open?id=1B74mNa-qiyF4UfD6uFtId19bt2dERi5T

The cumulative reward increases for the first steps where the agents are trying to find the first checkpoint, but them it becomes stable at around 0.7 and slowly increasing towards 1 (since they start to constantly fall on the first spikes).

mbaske commented 6 years ago

@jpax1996 Maybe your agent needs a motivation to stay alive ;) Try setting ~~a small constant reward (e.g. 0.01) at every step and~~ a negative reward of -1 when it falls on the spikes. Can the agent actually perceive the checkpoints? Or does it come across them by chance?

(Sorry, that constant reward was a bad idea - it would probably just cause the agent to keep moving back and forth without doing much else.)

jpax1996 commented 6 years ago

So the agent comes across the checkpoints by chance, i tried adding a penalty to falling on the spikes but it doesn't seem to change much. I have tried giving a small constant reward every frame the agent gets closer to the next checkpoint, but the issue is that the agent will not be able to reach the third checkpoint where he will need to go to the left. I also think curiosity doesn't work well with constant rewarding.

Giving your agent curiosity won’t help in all situations. Particularly if your environment already contains a dense reward function, such as our Crawler and Walker environments, where a non-zero reward is received after most actions, you may not see much improvement. If your environment contains only sparse rewards, then adding intrinsic rewards has the potential to turn these tasks from unsolvable to easily solvable using Reinforcement Learning. This has applicability particularly when it makes the most sense for simple rewards such as win/lose or completed/failed for tasks. blog post

vincentpierre commented 6 years ago

If the agent cannot perceive the checkpoints, then I do not think it will work well beyond just one checkpoint. I think the checkpoint should be like vertical bars, if the agent goes beyond a certain distance in the game, then the agent is rewarded. The agent is penalized when dying of course. The observations of the agent must be relevant to the problem. I would try using raycasts if it is not already the case.

jpax1996 commented 6 years ago

The observations i'm currently giving to the agent are:

Its position in the level
Its velocity
The end flag position (which might not be useful now that i think about it) How would you recommend i use raycasts for this problem? I saw some other examples (such as self driving cars) that used raycasts to navigate through a map, but i'm not sure how i could implement it for this project.

vincentpierre commented 6 years ago

Do you normalize the position in the level ? I think this might be hard for the agent to represent. The agent would need to remember the position of each trap and platform to solve the problem wich is impossible. It is like if you tried to explore a dark room only knowing where you are (but no information about the objects in the room). I would try sending raycasts around the agent (Maybe in all 8 or 16 directions) and report :

0 if nothing collides 1 if an abject collides
the normalized distance to the hitting point
a one hot encoding of the nature of the hit object : [platform, spikes] You can look at the RayPerception script in the Shared Asset folder for reference. I hope this helps

jpax1996 commented 6 years ago

that will probably help a lot, i will implement it and let you know how everything works out

jpax1996 commented 6 years ago

So i'm trying to make the spherecast for my agent in order to see if it helps, but for some reason the spherecast is not hitting the colliders in the level. Code for the sphere cast

Editor view of the sphere casts

I decided to use the code "RayPerception" that was provided in the assets and made some small changes.

jpax1996 commented 6 years ago

I fixed the issue, since my game is in 2D, i should be using physics2D instead of just physics for the raycast.

jpax1996 commented 6 years ago

@vincentpierre Implementing the raycasts was a massive game changer, the agent was finally able to finish the level somewhat consistently. https://drive.google.com/open?id=16bXKy8ZbJcO1guL4Y85TJM1BfMtkr7gE

Thank you so much @vincentpierre for the help

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.