Open chauvinfish opened 6 years ago
Yes, I am happy to have help on this project and I have invited you as a collaborator. The code I have so far is mostly from [https://github.com/skjb/pysc2-tutorial]. I used the Zerg tutorial (which is the most up-to-date example of the API) and included the Q-learning model that he provided from the sparse learning. I would recommend checking it out, as he includes tutorials explaining what he is doing step-by-step. Also, I'm happy to answer any questions you might have.
This is my first experience with ML (and I am still fairly new to github and programming in general). So, if there is anything I do that you notice is wrong, feel free to correct me!
I am developing a zerg agent based on skjb's refined_phrase_agent. I choose zerg because zerg depend less on building position(p and t need to block pass with buildings) and zerg can produce a great many unit in one time.
I would be appreaciated if you can give me your contact address. For privacy consideration, you can send your email address or telegram id to my email address 'chauvinfish@gmail.com'
i have seen you script. it seems that you abandoned the 'hot square' and 'green square' states in current_states. I don't think the way the original author handles it is very high. But we need to figure out a way to add enemy possition into current_states or our agent will always be blind and deaf
And our agent need to find the correct possition for second base. It's hard to locate the central possition of a mineral field with .mean() since you will get the coordinates of mineral fields in the whole map
The current bot is still a scripted bot rather than a true ai. I am spending whole day writing "smart action" description, interpreting actions into underlying operation. A true ai should output underlying operation directly if it can understand how the game works.
It seems that my zerg agent only learnt to create as many as overlord as possible, :)
I'm not sure I understand the hot_square and green_square not being in the current_states. I have this code for the hot squares (and another section for the green):
for i in range(0, 4): current_state[i + 4] = hot_squares[i]
Does this not work? I do agree that this could be handled better so that the agent has less states.
I've been trying to work on the agent finding the second base... I think I can have it check the minimap and look for the closest minerals nearby, and then move the camera to those mineral locations.
Do you mean that the bot is scripted since it has the 'sparse_agent_data' file? I'm not sure I understand how the agent would be able to learn through reinforcement without keeping track of a Q-learning table to save all of the states, actions, and rewards.
Also, is there any ML books (in English) that you would recommend that I read for this project?
1/move_camera (1/minimap [64, 64]) 2/select_point (6/select_point_act [4]; 0/screen [84, 84]) 3/select_rect (7/select_add [2]; 0/screen [84, 84]; 2/screen2 [84, 84]) 4/select_control_group (4/control_group_act [5]; 5/control_group_id [10]) 5/select_unit (8/select_unit_act [4]; 9/select_unit_id [500]) 6/select_idle_worker (10/select_worker [4]) 7/select_army (7/select_add [2]) 8/select_warp_gates (7/select_add [2]) 9/select_larva () 10/unload (12/unload_id [500]) 11/build_queue (11/build_queue_id [10]) 12/Attack_screen (3/queued [2]; 0/screen [84, 84]) 13/Attack_minimap (3/queued [2]; 1/minimap [64, 64]) 14/Attack_Attack_screen (3/queued [2]; 0/screen [84, 84]) 15/Attack_Attack_minimap (3/queued [2]; 1/minimap [64, 64]) 16/Attack_AttackBuilding_screen (3/queued [2]; 0/screen [84, 84]) 17/Attack_AttackBuilding_minimap (3/queued [2]; 1/minimap [64, 64]) 18/Attack_Redirect_screen (3/queued [2]; 0/screen [84, 84]) 19/Scan_Move_screen (3/queued [2]; 0/screen [84, 84]) 20/Scan_Move_minimap (3/queued [2]; 1/minimap [64, 64])
.
smart_actions = [ ACTION_DO_NOTHING, ACTION_SELECT_PROBE, ACTION_BUILD_PYLON, ACTION_BUILD_GATEWAY, ACTION_SELECT_GATEWAY, ACTION_BUILD_ZEALOT, ACTION_SELECT_ARMY, ACTION_ATTACK, ]
There is no RL book for sc2 currently. Because not many people are using pysc2 as deeplearning environment.
If there is a RL book in Chinese on a unpopular subject, it must be translated into Chinese.
Oh ok, I think I understand. So would something like this be better to replace all of the elif statements at the end?
smart_action = random.choice(actions.FUNCTIONS); if self.can_do(obs, smart_action.id): actions.smart_action
This would seem to pass all the possible functions, so then we would just need to add the correct arguments to each choice.
yeah, I figured there was nothing for sc2. I was just wondering if there was any books in general that you would recommend. I do have a couple books that go over RL for python that I will probably start looking through.
i successfully implement a RL agent using full conv network and a3c algorithm. https://github.com/xhujoy/pysc2-agents the author is a chinese too. I can contact him. he stopped working on it after open source. so the project can't run with the newest pysc2. but i modified it and make it run
this agent can mimic real human player, output direct actions. So far,such agent can only play some minigame like move_to_beacon,deafet_roach.It's incapable of playing duel games.However it's a more promising future then scripted bots. people expect agent to play sc2 game rather then print out a '16 marine drop' timetable
Still, it's incredible if a A.I. learnt to print a '16 marine drop' timetable
I suggest the sc2 has two game logic. Marco logic:like choose orcale harassment or 4 gateway all-in.Mirco logic: how to control groups and single unit, cast spells etc. A agent should learn both.
those elif statements present the game logic, if you replace them with random.choice(),the agent will become much stupid
I fork the project and uploaded my modified code. you can download and have fun https://github.com/chauvinfish/pysc2-agents
I am a university student in China.I am also interested in ML. I am not very experienced, it's my first time play with pysc2.I do get some basic concepts and debugging skills……