Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.18k stars 4.16k forks source link

Great work! Been having a bunch of fun. #183

Closed mikecann closed 6 years ago

mikecann commented 6 years ago

This isnt really an issue just a thankyou for this awesome lib.

I have started a blog post series where I eventually want to be able to train an agent to play one of my previous games (Mr Nibbles Forever).

https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-a-little-deeper/

For now I have only managed to get a very simple 1d agent going but I plan on adding more complexity as I go along.

Anyways feel free to close this issue, just wanted to say thanks.

MarcoMeter commented 6 years ago

Mr Nibbles Forever looks like a pretty cool use-case. Coming up with a suitable state space sounds like a major challenge to me. Curriculum Learning should be a pretty helpful addition ones you face the entire complexity of your game.

mikecann commented 6 years ago

@MarcoMeter My original plan was Mr Nibbles Forever, here is a quick video of the gameplay: https://www.youtube.com/watch?v=vO6mjWDz5RM

But I have thought on it more and I think it would be simpler (to begin with at least) to try to train an agent on its predecessor game Mr Nibbles:

https://youtu.be/lyAf7VVLdKg?t=25

Its a simpler puzzle game instead of an endless runner, it may be easier to do Curriculum Learning on that instead.

But yes I was wondering about state space. I was thinking that it might have to be a camera feed from the game.. thoughts?

MarcoMeter commented 6 years ago

@mikecann

I agree on using a camera image as input. I'd setup a different camera for the agent to exclude the background image. In the end, the image fed to the brain should be pretty abstract (grayscale and maybe less than 64x64 pixels). As this repository's PPO implementation does not feature frame stacking yet, the state space could be extended by the current velocity of the agent.

mikecann commented 6 years ago

@MarcoMeter oh cool, thanks for those tips. Not sure what frame stacking is yet so I will have a look into it.

The original mr nibbles is actually a grid based game (apart from mr nibbles himself) so there is potential I might not need to use the camera.

MarcoMeter commented 6 years ago

@mikecann Frame stacking deals with feed the current frame and n frames of the past to the neural net. So having the current and the past frame, the agent could derive the velocity from that input for example.

Fangh commented 6 years ago

Thank you @mikecann for this blog post. I think I will read it and use it to make my first ML agent

awjuliani commented 6 years ago

Very cool blog @mikecann!

As for state-space, there are three main possible approaches:

  1. Encode everything relevant into the state vector. For simple environment this works well, but it doesn't scale with dynamic numbers of objects within a scene.
  2. Use the camera as an observation. This captures everything relevant, but is harder to learn from, and also, since we don't yet have frame-stacking, the agent doesn't learn the important temporal information (like velocity).
  3. Use ray-casting or similar methods to capture all relevant objects close to the agent. This combines the "directness" of 1 with the "perception" of 2. Of course, it currently suffers from the same issue of not containing temporal information as 2, but we are working on an automated way to ask for "past 3 states" as your input to the network. You could also code something like this yourself, where CollectState() keeps track of the last 3 states itself, and passes them in a big vector.

Hope that helps! Definitely looking forward to seeing how things progress.

mikecann commented 6 years ago

@awjuliani thankyou so much for that excellent advice! Solution 3 sounds very interesting.

I was wondering why cant I just pass the velocity vector in as another state property?

Another solution is that because the original mr nibbles' world is laid out on a grid I could just encode that 2D world in states. This is exactly how I built the maps for the original game:

image

I could represent the state of the level in a 100x100 grid where each cell could be a number of different "types" (black is solid floor, yellow is a nibble, blue is a spider etc). Then I would also pass in the position of mr nibbles and his velocity and a few other things. Im thinking that might do it.

I dont think I would even have to supply collision information as the NN should just be able to learn the mapping between the relative states.

So im thinking something like 100x100x(number of cell types) + position + velocity + is-in-air .. so something like 80,003 states.. Im thinking this would take a long time to train?

MarcoMeter commented 6 years ago

DeePtraffic is a great example for the cell-like approach.

There are two things to keep in mind:

mikecann commented 6 years ago

@MarcoMeter thanks for sharing deeptraffic, thats really cool.

As to the points you mentioned I totally agree with starting off slow. As im probably going to have to pretty much re-build the game anyways (because the old one was written in haXe) it will be another reason to start simple, then train then add more complexity then train some more.

awjuliani commented 6 years ago

@mikecann sounds like a great approach! You can also definitely just feed in velocity information into the state to augment it.

Feel free to post here on the results of your experimentation, as I'd be interested in learning what approaches do and don't work for problems like this.

mikecann commented 6 years ago

@awjuliani Awesome, I will do. Planning on doing more work on this over the holidays.

mikecann commented 6 years ago

Hey @awjuliani just thought you would like to know I have finally found time to do a little more work on this and write it up: https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-mr-nibbles-basics/

I ran into a few issues that I think means I will have to use curriculum learning in the future.

Do you think I need to provide velocity or should the network be able to infer velocity from the previous position state?

MarcoMeter commented 6 years ago

Hi @mikecann I skimmed your blog post and code. It looks like that you don't add temporal information to your state space, so I suggest to add the velocity. Another potential problem is that you are not normalizing your state space. Maybe change normalize to true in your PPO members or do it manually in CollectState() of your implementation. I usually do it manually. You can find more information about it in the best practices doc.

mikecann commented 6 years ago

@MarcoMeter I was hoping that the velocity could be inferred from the current position and last position but I think you are right, I need to manually supply the velocity. I will read up on the normalize, not sure exactly what the reasons for doing that are just yet.

MarcoMeter commented 6 years ago

Having multiple positions in the state space should work as well. But if there is a possibility to reduce dimensions (e.g. by aggregation), then this is beneficial to your model as it reduces the scale. Two position vectors take more inputs than one velocity vector, but both contain somewhat the same information.

mikecann commented 6 years ago

@MarcoMeter sorry, I apologize. What I meant by "inferred from the current position and last position" wasnt that I actually supply the current pos and last pos per "CollectState()" as that would just be the velocity, like you mentioned.

Instead what I meant was that I thought that by default all state gathered by "CollectState()" is temporal. So the network automatically uses the previous state gathered by "CollectState()" as an input, thus the temporal nature of that should mean that the velocity should be inferred from the current position and the last.

Hope that makes sense :)

MarcoMeter commented 6 years ago

the network automatically uses the previous state gathered by "CollectState()" as an input

That's not the case (right now). The development branches contain an implementation to stack the state space up to 9 steps in the past (plus the current one).

mikecann commented 6 years ago

@MarcoMeter oh really? ahh okay, in which case I will definately need to provide the velocity then.

mikecann commented 6 years ago

Hey guys, having some issues trying to get my agent to learn how to jump:

https://youtu.be/2lIXYjx1RBw

He has learnt he needs to head towards the exit just fine but seems to be struggling at jumping over a little hurdle.

In the example above I have attempted to limit session length, by setting done = true when cumulative reward < -80.

I have also tried rewarding for each time the agent jumps but that doesnt seem to have helped and is technically wrong because the agent shouldnt be jumping all the time.

Here are my hyperparams: https://github.com/mikecann/MrNibblesML/blob/master/python/mrnibbles.ipynb

And this is the agent: https://github.com/mikecann/MrNibblesML/blob/master/unity/Assets/MrNibbles/Scripts/MrNibblesAgent.cs

Anyone got any clues? I would try curriculum learning but im not sure how that would help with teaching him how to jump over this small obstacle.

Any help would be greatly appreciated!

MarcoMeter commented 6 years ago

For your discrete action space, you've chosen a huuuuuge batch size. I'd go by a batch size of max. 128, rather smaller like 64.

Besides the hyperparameters, I think that your state space does not comprise enough information. I'd add some information about obstacles. Maybe cast a ray to let the agent sense any obstacles in front of it. If you are using a camera observation as input, I'd still try that.

mikecann commented 6 years ago

Hi @MarcoMeter, thanks for those tips.

I went for the batch size I did because my action space is actually continuous because I want the agent to be able to perform multiple actions at once (eg jump and move right) thus I think I read somewhere that your action space needs to be continuous. But I will try lowering my batch size and see where I get to 👍

Well I was hoping that I wouldn't need to provide collision information as the world is on an evenly spaced grid, very much like the pixels in a camera.. thus if an agent can learn from the pixels in a camera they could be able to learn from the cells in my grid-world no?

MarcoMeter commented 6 years ago

I overlooked the fact that you are adding each tile's position. I'm wondering if a regular dense layer is capable of sensing the spatial information.

You can try to make the tile's position relative to the player.

In your case I'd do with discrete actions. If you want two actions to be done at once, then create an action which does both. That should be two more actions in your case I guess. That s not too dimensional for discrete action spaces. Concerning continuous control, I'd try to clamp (-1,1) each value first. For moving horizontally, one action is enough. Jumping would be triggered if the value is positive.

Maybe this gives you some more ideas.

mikecann commented 6 years ago

You can try to make the tile's position relative to the player.

This is interesting. Im not sure why making the coordinates relative to the player would help? Im feeding in the position of both the player and each tile in world coords, so why does it matter if its relative to the player or not?

I coded up something last night (but haven't tested it yet) where I only provide the tiles that surround the agent, thus it will mean I can have levels of arbitrary size.

In your case I'd do with discrete actions. If you want two actions to be done at once, then create an action which does both.

Good call, ill try that. Just for my understanding could you explain why discreet action space is preferred over continuous?

Concerning continuous control, I'd try to clamp (-1,1) each value first. For moving horizontally, one action is enough. Jumping would be triggered if the value is positive.

Not sure I fully follow your meaning here. If I am using discrete action-space is this still relevant?

The player can press jump to jump, but if he holds the jump key down longer he can jump higher. I want the agent to learn this. I was hoping the agent would just be able to learn this without any more state but perhaps I should feed in another state variable so the agent can learn that it can continue to jump to get higher?

MarcoMeter commented 6 years ago

Im not sure why making the coordinates relative to the player would help?

It might has the potential add some spatial value. Like all what's close to the agent has small values.

Just for my understanding could you explain why discreet action space is preferred over continuous?

Continuous is much more complex. As long as your environment can be played using keyboard keys, I'd go with discrete actions.

Not sure I fully follow your meaning here. If I am using discrete action-space is this still relevant?

This doesn't apply to discrete action spaces. In continuous space, the output for each action can be any real number. It's pretty common practice to clamp the output before taking the action.

The player can press jump to jump, but if he holds the jump key down longer he can jump higher.

That's an interesting scenario. That's where the time of events matter. Well, I'd try to not add any further inputs or rewards to enforce such a jumping behavior. The reinforcement learning algorithm should be able to solve it.

mmattar commented 6 years ago

Folks, closing the issue as it has been inactive for 30 days. Feel free to reopen to continue the discussion of anything else comes up.

lock[bot] commented 4 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.