It seems dqn can't learn much

Seraphli commented 6 years ago

I ran the script last night. It started with ~11 mean reward, and ended with ~15.5 mean reward. I tried to play this mini-game myself, and I could get ~100 score or more. Deepmind reached ~100 score in their video. Begin End

vors commented 6 years ago

Kind of got a similar experience, but it actually dropped from 10 to 5 :D Here is what the net learned on my laptop, marines are mostly hanging out at the bottom of the screen marines

chris-chris commented 6 years ago

Yeah, guys. I'm trying to enhance the score using the A3C algorithm. I'm re-writing the example codes.

If you have any improvement, please let me know! :)

chris-chris commented 6 years ago

I'm applying the A3C algorithm on it. This is the baseline agent of the paper. https://deepmind.com/documents/110/sc2le.pdf

vors commented 6 years ago

Awesome! Will try it soon

ShadowDancer commented 6 years ago

@chris-chris How's going with A3C? I see you changed the principle, is it getting any better?

chris-chris commented 6 years ago

@Seraphli @ShadowDancer @vors @yilei

I applied A2C algorithm. I think it works better. you can train it with commands below.

python train_mineral_shards.py --algorithm=a2c --num_agents=2 --num_scripts=2 --timesteps=2000000

Seraphli commented 6 years ago

I tried to run the code, and at some point the program threw out this error.

Traceback (most recent call last):
  File "train_mineral_shards.py", line 304, in <module>
    main()
  File "train_mineral_shards.py", line 183, in main
    callback=a2c_callback)
  File "/home/seraphli/Github/pysc2-examples/a2c/a2c.py", line 748, in learn
    obs, states, rewards, masks, actions, xy0, xy1, values = runner.run()
  File "/home/seraphli/Github/pysc2-examples/a2c/a2c.py", line 621, in run
    self.update_obs(obs)
  File "/home/seraphli/Github/pysc2-examples/a2c/a2c.py", line 297, in update_obs
    marine1 = self.xy_per_marine[env_num]["1"]
KeyError: '1'

davidkuhta commented 6 years ago

I'm having the same KeyError: '1' issue as @Seraphli (output near identical to the above). Any idea where to look?

chris-chris commented 6 years ago

@davidkuhta @Seraphli I'll fix it! thanks!!

chris-chris commented 6 years ago

@davidkuhta @Seraphli I fixed it. Can you guys check it out?

davidkuhta commented 6 years ago

Thanks @chris-chris! Running it now, will follow-up

davidkuhta commented 6 years ago

Ok, still ran into the same issue, I did see the initialization in the last commit: self.xy_per_marine = [{"0":[0,0], "1":[0,0]} for _ in range(nenv)] I'm re-ran having added a print statement at 296 to output the self.xy_per_marine[env_num] dict.

Here's how it ended:

...
self.total_reward : [90.0, 87.0, 129.0, 92.0, 0.0, 0.0, 0.0, 0.0]
{'1': [15, 15], '0': [9, 12]}
{'1': [18, 15], '0': [28, 3]}
{'1': [18, 9], '0': [1, 3]}
{'1': [6, 15], '0': [25, 5]}
{'1': [5, 15], '0': [2, 9]}
{'1': [13, 10], '0': [6, 1]}
{'1': [27, 19], '0': [6, 5]}
{'1': [20, 17], '0': [6, 16]}
rewards :  [0 0 0 0 0 0 0 0]
self.total_reward : [90.0, 87.0, 129.0, 92.0, 0.0, 0.0, 0.0, 0.0]
{'1': [15, 15], '0': [9, 12]}
{'1': [18, 15], '0': [28, 3]}
{'1': [18, 9], '0': [1, 3]}
{'1': [6, 15], '0': [25, 5]}
{'1': [5, 15], '0': [2, 9]}
{'1': [13, 10], '0': [6, 1]}
{'1': [27, 19], '0': [6, 5]}
{'1': [20, 17], '0': [6, 16]}
Game has started.
init group list
env 2 done! reward : 130.0 mean_100ep_reward : 84.7 
rewards :  [0 0 1 0 0 0 0 0]
self.total_reward : [90.0, 87.0, 0, 92.0, 0.0, 0.0, 0.0, 0.0]
{'1': [15, 15], '0': [9, 12]}
{'1': [18, 15], '0': [28, 3]}
{'0': [11, 11]}
Traceback (most recent call last):
  File "train_mineral_shards.py", line 302, in <module>
    main()
  File "train_mineral_shards.py", line 181, in main
    callback=a2c_callback)
  File "/home/AI/pysc2-examples/a2c/a2c.py", line 749, in learn
    obs, states, rewards, masks, actions, xy0, xy1, values = runner.run()
  File "/home/AI/pysc2-examples/a2c/a2c.py", line 622, in run
    self.update_obs(obs)
  File "/home/AI/pysc2-examples/a2c/a2c.py", line 298, in update_obs
    marine1 = self.xy_per_marine[env_num]["1"]
KeyError: '1'

simonmeister commented 6 years ago

Just in case anyone would like to look at a alternative work-in-progress implementation without openai-baselines dependency and complete action space: https://github.com/simonmeister/pysc2-rl-agents.

mushroom1116 commented 6 years ago

Is there anyone who has encountered this error?

TypeError: Can't instantiate abstract class SubprocVecEnv with abstract methods step_async, step_wait

soneo1127 commented 6 years ago

@mushroom1116 I change the pysc2-examples/common/vec_env/subproc_vec_env.py

from baselines.common.vec_env import VecEnv

to

from . import VecEnv

and can run.

chris-chris / pysc2-examples

It seems dqn can't learn much #2