Properly Implementing AC3 Algorithm

Eric-Wallace commented 6 years ago

This may be a naive question, but in the process of implementing my RL bot using AC3, I had to place many of my changes into the environment files such as run_loop.py. Using this method, I was able to control when calls to .step() occurred. Is this the proper method for implementing non-scripted bots?

Eric-Wallace commented 6 years ago

Closing this as I discovered the comment that we should be modifying this file.

tewalds commented 6 years ago

It'd be interesting to see what you're doing. If your agent is pretty simple or self-contained you shouldn't need to change much, if anything, outside the agent class. If you're doing something fairly fancy (a3c might count?) you may want to write your own run loop and launcher (ie replay bin/agent.py). Our fancier agents don't change anything under sc2_env, but they do use their own run loop.

Eric-Wallace commented 6 years ago

@tewalds My current edits look like: run_loop.py -> most of the dirty work, takes in the observations, runs through the neural network, gets the policy back, takes the action. This is run in parallel by multiple workers agent.py -> modified main() to create my Tensorflow graphs, initialize global neural network parameters, and create worker objects that execute the same agent in parallel

yukang2017 commented 6 years ago

@Eric-Wallace I wrote a Actor-Critic model to train the mini-games. I modified run_loop.py and scripted_agent.py, with nothing change in agent.py.

Eric-Wallace commented 6 years ago

@yukang2017 would you mind sharing your code with me? I can share my approach as well. If you aren't comfortable uploading to github, you could email at ewallac2@umd.edu.

tylerwhipple commented 6 years ago

I would love to see it as well, but understand if you want to keep it private. I dont know if there is a good place to discuss their work around this.

yukang2017 commented 6 years ago

Happy to share my code with you guys. Nothing secret, just a simple Actor-Critic Model for the mini-game MoveToBeacon. I use python 3.6 and tensorflow 1.3

To test my code, replace run_loop.py with mine and use this trained_agent file instead of scripted_agent.py

It works, but the trained result of this model is really poor, so I am struggling for how to modify it. If you find some mistakes of mine , please tell me. @Eric-Wallace @tylerwhipple @tewalds

trained_agent_ByYukang.txt run_loop_ByYukang.txt

killerz99 commented 6 years ago

how do you call the new agent? I tried: python -m pysc2.bin.agent --map MoveToBeacon --agent pysc2.agents.scripted_agent.MoveToBeaconRL

but i get: AttributeError: 'module' object has no attribute 'MoveToBeaconRL'

tewalds commented 6 years ago

@yukang2017 Very cool that you've got an agent that trains at least a little. You may want to try with a larger step_mul for a while so the initial random walk can get farther. How long are you letting it run? If you don't have any bugs and you've got the right hyper params you can probably get a decent agent in a few tens of thousands of agent steps, but make sure you're letting it run for at least 1M steps before declaring it buggy as it may just be the wrong hyper params. It seems it's also worth making sure it's giving valid actions. Having to clamp x,y to 0 suggests that maybe it's giving values that are out of range, and that you should enforce that inside the network some other way, or just verify that it can reach all x,y coords.

@killerz99 Presumably you saved the MoveToBeaconRL to a file named something other than scripted_agent.py. Make sure the --agent arg points to the actual location of the MoveToBeaconRL module/class path, eg pysc2.agents.trained_agent.MoveToBeaconRL.

yukang2017 commented 6 years ago

@tewalds Thank you, I will try it.

@killerz99 If you named the txt file as trained_agent.py, you should try: python -m pysc2.bin.agent --map MoveToBeacon --agent pysc2.agents.trained_agent.MoveToBeaconRL

Eric-Wallace commented 6 years ago

@yukang2017 hey why don't you send me a email at the one I posted above and we can work on making this work well. I have some suggestions for improvements.

killerz99 commented 6 years ago

@yukang2017 thanks, I have it running now. I have a question, does it know where the beacon actually is when it starts? or is it just randomly walking around the screen and if it happens to hit it, it gets some reinforcement? The marine seems to walk from the top left corner to the bottom right corner in a diagonal, hoping to hit the beacon? Also, is the state of the agent saved, such that subsequent runs pick up where previous runs left off?

thanks for making your code available, its really instructive.

killerz99 commented 6 years ago

hmm.. actually, in the first several epochs the marine goes right for the beacon and makes it. But as the training progresses, it seems to not make it there, even though you can see that the selected move position is right on top of the beacon.

killerz99 commented 6 years ago

if you increase the learning rate to 3 and put a floor of 1, it usually scores mid 20's and max of 32. Seems to be going straight for the beacon. But when the learning rate starts dropping below 0.75, the marine starts to hesitate and once below 0.5, it starts getting stuck in the corner.

yukang2017 commented 6 years ago

@Eric-Wallace Sorry. I thought you would find it here...

@killerz99 I think the problem is caused by the structure of network....I am still trying to solve it.

botdot commented 6 years ago

@yukang2017 And why do not you create your own branch and do not fill in your code there?

botdot commented 6 years ago

At me by the way your script does not work, kills the processor after start at once :(

google-deepmind / pysc2

Properly Implementing AC3 Algorithm #34