Closed jpax1996 closed 6 years ago
Maybe you should not reward the AI for going forward but reward it for passing certain checkpoints. With this your AI should not try to constantly move forward. You can activate curiosity if you feel your reward signal is too sparse. This will force your agent to explore more. According to this it is even possible to train a game like Super Mario using only curiosity (no reward at all).
@vincentpierre Thank you for the response, I have made some checkpoints but the result is similar, how would i go about activating curiosity?
In the training_config.yaml
you can add use_curiosity: true
. Refer to the Pyramids environment that uses curiosity.
I hope this helps.
That helps a lot, thank you If I use curiosity, how does that effect the reward system that I have put in place? Will the agent still use the rewards?
Thanks man, I will try to implement it and let you know how it works out
@vincentpierre So i have set up the curiosity and everything seems to be running, but i am still not sure when to reward the agent Right now i am rewarding +1 when the agent reaches one of the check points in the level (red squares in the editor). When the agent falls in the spikes,at the moment, i don't penalize him. The curiosity is set to true, curiosity_strength: 0.01 and curiosity_enc_size: 128. Am i doing this right? should i play around with the curiosity values until the agent solves the level?
It's hard to tell if it will learn but I think it should. Does the cumulative reward increase ? How long do you train it for ?
Here's a video showing a bit of the training: https://drive.google.com/open?id=1-SaAU7sUWIrk5M9XgmpMisAJJFmLMiAp
I train it with a max step of 100e5 but they quickly get stuck on a loop (some sort of local maximum), where they kill themselves on the first set of spikes after recieving the first reward: https://drive.google.com/open?id=1B74mNa-qiyF4UfD6uFtId19bt2dERi5T
The cumulative reward increases for the first steps where the agents are trying to find the first checkpoint, but them it becomes stable at around 0.7 and slowly increasing towards 1 (since they start to constantly fall on the first spikes).
@jpax1996 Maybe your agent needs a motivation to stay alive ;) Try setting a small constant reward (e.g. 0.01) at every step and a negative reward of -1 when it falls on the spikes.
Can the agent actually perceive the checkpoints? Or does it come across them by chance?
(Sorry, that constant reward was a bad idea - it would probably just cause the agent to keep moving back and forth without doing much else.)
So the agent comes across the checkpoints by chance, i tried adding a penalty to falling on the spikes but it doesn't seem to change much. I have tried giving a small constant reward every frame the agent gets closer to the next checkpoint, but the issue is that the agent will not be able to reach the third checkpoint where he will need to go to the left. I also think curiosity doesn't work well with constant rewarding.
Giving your agent curiosity won’t help in all situations. Particularly if your environment already contains a dense reward function, such as our Crawler and Walker environments, where a non-zero reward is received after most actions, you may not see much improvement. If your environment contains only sparse rewards, then adding intrinsic rewards has the potential to turn these tasks from unsolvable to easily solvable using Reinforcement Learning. This has applicability particularly when it makes the most sense for simple rewards such as win/lose or completed/failed for tasks. blog post
If the agent cannot perceive the checkpoints, then I do not think it will work well beyond just one checkpoint. I think the checkpoint should be like vertical bars, if the agent goes beyond a certain distance in the game, then the agent is rewarded. The agent is penalized when dying of course. The observations of the agent must be relevant to the problem. I would try using raycasts if it is not already the case.
The observations i'm currently giving to the agent are:
Do you normalize the position in the level ? I think this might be hard for the agent to represent. The agent would need to remember the position of each trap and platform to solve the problem wich is impossible. It is like if you tried to explore a dark room only knowing where you are (but no information about the objects in the room). I would try sending raycasts around the agent (Maybe in all 8 or 16 directions) and report :
that will probably help a lot, i will implement it and let you know how everything works out
So i'm trying to make the spherecast for my agent in order to see if it helps, but for some reason the spherecast is not hitting the colliders in the level. Code for the sphere cast
Editor view of the sphere casts
I decided to use the code "RayPerception" that was provided in the assets and made some small changes.
I fixed the issue, since my game is in 2D, i should be using physics2D instead of just physics for the raycast.
@vincentpierre Implementing the raycasts was a massive game changer, the agent was finally able to finish the level somewhat consistently. https://drive.google.com/open?id=16bXKy8ZbJcO1guL4Y85TJM1BfMtkr7gE
Thank you so much @vincentpierre for the help
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
I'm having issues teaching my agents to go back in order to progress. Since they start off by learning that if you move forward you will get a reward, they never try to go back ,as you can see in this video: https://drive.google.com/open?id=1WQbIaHZ6_FtATPlfLALCnk3AQjKbt4L7
I put a tile to black the passage under the platform so that the ai doesn't fall on the spikes, i felt like that could help him learn better. Do you guys have any suggestions on what i should try?
Thanks for the help