Closed nosound2 closed 2 years ago
Yet, without setting the config in stone, this change seems to interfere with multi-cpu training: it's now hanging.
Yet, without setting the config in stone, this change seems to interfere with multi-cpu training: it's now hanging.
Can you please give more insight, why does it hang now? I am not using multi-cpu yet. Thanks for testing!
Yet, without setting the config in stone, this change seems to interfere with multi-cpu training: it's now hanging.
Can you please give more insight, why does it hang now? I am not using multi-cpu yet. Thanks for testing!
I wish I knew (to run a quick test: in train.py
change num_cpu = 1
to 2
, wait until it outputs Logging to tensorboard [...]
then kill it - when working properly some PPO output shows up within minutes).
Upon killing it the stack is referring to subproc_vec_env.py
from stable_baselines3
. From there I am not familiar at all... Given a choice I would pick a better training with a single cpu than a fast, buggy one.
@royerk , I changed to num_cpu = 2
and it seems to be running fine with the fix, printing a table from time to time, something like this. Are you sure this is the only change on your machine?
-----------------------------------------
| time/ | |
| fps | 477 |
| iterations | 3 |
| time_elapsed | 205 |
| total_timesteps | 98304 |
| train/ | |
| approx_kl | 0.010162503 |
| clip_fraction | 0.0429 |
| clip_range | 0.2 |
| entropy_loss | -2.06 |
| explained_variance | 0.336 |
| learning_rate | 0.001 |
| loss | 2.18 |
| n_updates | 20 |
| policy_gradient_loss | -0.00253 |
| value_loss | 7.71 |
-----------------------------------------
@nosound2 You are right. I cleaned up and rerun the setup.py
and it's fine. Thanks :+1:
Thanks for this change! Not editing the config is a good solution to this problem, ignore what I commented earlier. Games reset repeatedly, and we want to leave the config intact so it randomly generates new dimensions each episode.
Hey, just FYI. This is really cool and should help generalizing a lot but for some reason it killed the multi-cpu perf on my end. I thought it was hanging but it was actually very slooooow. With 60 cpus I used to have 5k 'fps' (from the PPO metrics this looks like steps/second), now I am getting <200 fps. Training on a single cpu is now faster.
I thought that re-generating a map could cause a delay, but generate_map()
has always been there. At this point I don't know why it is so slow. I will try to re-install everything and test again. If you have an idea of what's going I can test/investigate.
Cheers
@royerk. It might be because of the map-size. Before maybe it was running 12x12 map sizes for every training run. Now it will be using all the sizes, where some of them now range up to 32x32. This might involve a lot more units and actions per game for these large map sizes. Maybe that could correspond to it?
@glmcdona haha right bots were dominating on those 12x12! It's worth investigating, yet I would still expect better fps with more cpus. Thanks :)
Did this fix map gen? Does the map gen test pass now?
@StoneT2000 not sure what the output should be but it looks ok:
➜ tests git:(main) ✗ python -m unittest test_map.py
Testing generating game...
w,..c,c,......................c,..........u,....................
....c,c,....................c,c,........................w,w,....
..............................w,w,......................w,w,....
..............................Waw,..............................
................w,w,............................................
............w,w,w,w,............................................
..................w,............................................
................................................................
..........w,w,w,......................................c,c,......
............w,w,w,....................................c,c,......
............w,w,................................................
......................................w,w,......................
......................................w,w,......................
c,..................................w,..w,w,....................
c,................................w,........w,w,..............u,
..............................................................u,
..............................................................u,
c,................................w,........w,w,..............u,
c,..................................w,..w,w,....................
......................................w,w,......................
......................................w,w,......................
............w,w,................................................
............w,w,w,....................................c,c,......
..........w,w,w,......................................c,c,......
................................................................
..................w,............................................
............w,w,w,w,............................................
................w,w,............................................
..............................Wbw,..............................
..............................w,w,......................w,w,....
....c,c,....................c,c,........................w,w,....
w,..c,c,......................c,..........u,....................
Map shape: 32,32
{'teamStats': {0: {'fuelGenerated': 0, 'resourcesCollected': {'wood': 0, 'coal': 0, 'uranium': 0}, 'cityTilesBuilt': 0, 'workersBuilt': 1, 'cartsBuilt': 0, 'roadsBuilt': 0, 'roadsPillaged': 0}, 1: {'fuelGenerated': 0, 'resourcesCollected': {'wood': 0, 'coal': 0, 'uranium': 0}, 'cityTilesBuilt': 0, 'workersBuilt': 1, 'cartsBuilt': 0, 'roadsBuilt': 0, 'roadsPillaged': 0}}}
{'turn': 0, 'teamStates': {0: {'researchPoints': 0, 'units': {'u_1': <luxai2021.game.unit.Worker object at 0x7f761da08b10>}, 'researched': {'wood': True, 'coal': False, 'uranium': False}}, 1: {'researchPoints': 0, 'units': {'u_2': <luxai2021.game.unit.Worker object at 0x7f761da18450>}, 'researched': {'wood': True, 'coal': False, 'uranium': False}}}}
Passed game creation test!
.Testing game simulation speed
Simple empty game: 0.130 seconds per full game.
.Testing game map validity against 100 seeds
.
----------------------------------------------------------------------
Ran 3 tests in 14.689s
OK
wow that map gen is perfect! (and the original code was soo messy to begin with)
Closes #74
@glmcdona , can you please take a look if this is a satisfactory solution to the linked issue.