Closed mikecann closed 6 years ago
Mr Nibbles Forever looks like a pretty cool use-case. Coming up with a suitable state space sounds like a major challenge to me. Curriculum Learning should be a pretty helpful addition ones you face the entire complexity of your game.
@MarcoMeter My original plan was Mr Nibbles Forever, here is a quick video of the gameplay: https://www.youtube.com/watch?v=vO6mjWDz5RM
But I have thought on it more and I think it would be simpler (to begin with at least) to try to train an agent on its predecessor game Mr Nibbles:
https://youtu.be/lyAf7VVLdKg?t=25
Its a simpler puzzle game instead of an endless runner, it may be easier to do Curriculum Learning on that instead.
But yes I was wondering about state space. I was thinking that it might have to be a camera feed from the game.. thoughts?
@mikecann
I agree on using a camera image as input. I'd setup a different camera for the agent to exclude the background image. In the end, the image fed to the brain should be pretty abstract (grayscale and maybe less than 64x64 pixels). As this repository's PPO implementation does not feature frame stacking yet, the state space could be extended by the current velocity of the agent.
@MarcoMeter oh cool, thanks for those tips. Not sure what frame stacking is yet so I will have a look into it.
The original mr nibbles is actually a grid based game (apart from mr nibbles himself) so there is potential I might not need to use the camera.
@mikecann Frame stacking deals with feed the current frame and n frames of the past to the neural net. So having the current and the past frame, the agent could derive the velocity from that input for example.
Thank you @mikecann for this blog post. I think I will read it and use it to make my first ML agent
Very cool blog @mikecann!
As for state-space, there are three main possible approaches:
CollectState()
keeps track of the last 3 states itself, and passes them in a big vector. Hope that helps! Definitely looking forward to seeing how things progress.
@awjuliani thankyou so much for that excellent advice! Solution 3 sounds very interesting.
I was wondering why cant I just pass the velocity vector in as another state property?
Another solution is that because the original mr nibbles' world is laid out on a grid I could just encode that 2D world in states. This is exactly how I built the maps for the original game:
I could represent the state of the level in a 100x100 grid where each cell could be a number of different "types" (black is solid floor, yellow is a nibble, blue is a spider etc). Then I would also pass in the position of mr nibbles and his velocity and a few other things. Im thinking that might do it.
I dont think I would even have to supply collision information as the NN should just be able to learn the mapping between the relative states.
So im thinking something like 100x100x(number of cell types) + position + velocity + is-in-air .. so something like 80,003 states.. Im thinking this would take a long time to train?
DeePtraffic is a great example for the cell-like approach.
There are two things to keep in mind:
@MarcoMeter thanks for sharing deeptraffic, thats really cool.
As to the points you mentioned I totally agree with starting off slow. As im probably going to have to pretty much re-build the game anyways (because the old one was written in haXe) it will be another reason to start simple, then train then add more complexity then train some more.
@mikecann sounds like a great approach! You can also definitely just feed in velocity information into the state to augment it.
Feel free to post here on the results of your experimentation, as I'd be interested in learning what approaches do and don't work for problems like this.
@awjuliani Awesome, I will do. Planning on doing more work on this over the holidays.
Hey @awjuliani just thought you would like to know I have finally found time to do a little more work on this and write it up: https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-mr-nibbles-basics/
I ran into a few issues that I think means I will have to use curriculum learning in the future.
Do you think I need to provide velocity or should the network be able to infer velocity from the previous position state?
Hi @mikecann
I skimmed your blog post and code. It looks like that you don't add temporal information to your state space, so I suggest to add the velocity. Another potential problem is that you are not normalizing your state space. Maybe change normalize
to true in your PPO members or do it manually in CollectState()
of your implementation. I usually do it manually. You can find more information about it in the best practices doc.
@MarcoMeter I was hoping that the velocity could be inferred from the current position and last position but I think you are right, I need to manually supply the velocity. I will read up on the normalize, not sure exactly what the reasons for doing that are just yet.
Having multiple positions in the state space should work as well. But if there is a possibility to reduce dimensions (e.g. by aggregation), then this is beneficial to your model as it reduces the scale. Two position vectors take more inputs than one velocity vector, but both contain somewhat the same information.
@MarcoMeter sorry, I apologize. What I meant by "inferred from the current position and last position" wasnt that I actually supply the current pos and last pos per "CollectState()" as that would just be the velocity, like you mentioned.
Instead what I meant was that I thought that by default all state gathered by "CollectState()" is temporal. So the network automatically uses the previous state gathered by "CollectState()" as an input, thus the temporal nature of that should mean that the velocity should be inferred from the current position and the last.
Hope that makes sense :)
the network automatically uses the previous state gathered by "CollectState()" as an input
That's not the case (right now). The development branches contain an implementation to stack the state space up to 9 steps in the past (plus the current one).
@MarcoMeter oh really? ahh okay, in which case I will definately need to provide the velocity then.
Hey guys, having some issues trying to get my agent to learn how to jump:
He has learnt he needs to head towards the exit just fine but seems to be struggling at jumping over a little hurdle.
In the example above I have attempted to limit session length, by setting done = true when cumulative reward < -80.
I have also tried rewarding for each time the agent jumps but that doesnt seem to have helped and is technically wrong because the agent shouldnt be jumping all the time.
Here are my hyperparams: https://github.com/mikecann/MrNibblesML/blob/master/python/mrnibbles.ipynb
And this is the agent: https://github.com/mikecann/MrNibblesML/blob/master/unity/Assets/MrNibbles/Scripts/MrNibblesAgent.cs
Anyone got any clues? I would try curriculum learning but im not sure how that would help with teaching him how to jump over this small obstacle.
Any help would be greatly appreciated!
For your discrete action space, you've chosen a huuuuuge batch size. I'd go by a batch size of max. 128, rather smaller like 64.
Besides the hyperparameters, I think that your state space does not comprise enough information. I'd add some information about obstacles. Maybe cast a ray to let the agent sense any obstacles in front of it. If you are using a camera observation as input, I'd still try that.
Hi @MarcoMeter, thanks for those tips.
I went for the batch size I did because my action space is actually continuous because I want the agent to be able to perform multiple actions at once (eg jump and move right) thus I think I read somewhere that your action space needs to be continuous. But I will try lowering my batch size and see where I get to 👍
Well I was hoping that I wouldn't need to provide collision information as the world is on an evenly spaced grid, very much like the pixels in a camera.. thus if an agent can learn from the pixels in a camera they could be able to learn from the cells in my grid-world no?
I overlooked the fact that you are adding each tile's position. I'm wondering if a regular dense layer is capable of sensing the spatial information.
You can try to make the tile's position relative to the player.
In your case I'd do with discrete actions. If you want two actions to be done at once, then create an action which does both. That should be two more actions in your case I guess. That s not too dimensional for discrete action spaces. Concerning continuous control, I'd try to clamp (-1,1) each value first. For moving horizontally, one action is enough. Jumping would be triggered if the value is positive.
Maybe this gives you some more ideas.
You can try to make the tile's position relative to the player.
This is interesting. Im not sure why making the coordinates relative to the player would help? Im feeding in the position of both the player and each tile in world coords, so why does it matter if its relative to the player or not?
I coded up something last night (but haven't tested it yet) where I only provide the tiles that surround the agent, thus it will mean I can have levels of arbitrary size.
In your case I'd do with discrete actions. If you want two actions to be done at once, then create an action which does both.
Good call, ill try that. Just for my understanding could you explain why discreet action space is preferred over continuous?
Concerning continuous control, I'd try to clamp (-1,1) each value first. For moving horizontally, one action is enough. Jumping would be triggered if the value is positive.
Not sure I fully follow your meaning here. If I am using discrete action-space is this still relevant?
The player can press jump to jump, but if he holds the jump key down longer he can jump higher. I want the agent to learn this. I was hoping the agent would just be able to learn this without any more state but perhaps I should feed in another state variable so the agent can learn that it can continue to jump to get higher?
Im not sure why making the coordinates relative to the player would help?
It might has the potential add some spatial value. Like all what's close to the agent has small values.
Just for my understanding could you explain why discreet action space is preferred over continuous?
Continuous is much more complex. As long as your environment can be played using keyboard keys, I'd go with discrete actions.
Not sure I fully follow your meaning here. If I am using discrete action-space is this still relevant?
This doesn't apply to discrete action spaces. In continuous space, the output for each action can be any real number. It's pretty common practice to clamp the output before taking the action.
The player can press jump to jump, but if he holds the jump key down longer he can jump higher.
That's an interesting scenario. That's where the time of events matter. Well, I'd try to not add any further inputs or rewards to enforce such a jumping behavior. The reinforcement learning algorithm should be able to solve it.
Folks, closing the issue as it has been inactive for 30 days. Feel free to reopen to continue the discussion of anything else comes up.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
This isnt really an issue just a thankyou for this awesome lib.
I have started a blog post series where I eventually want to be able to train an agent to play one of my previous games (Mr Nibbles Forever).
https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-a-little-deeper/
For now I have only managed to get a very simple 1d agent going but I plan on adding more complexity as I go along.
Anyways feel free to close this issue, just wanted to say thanks.