TimZaman / dotaclient

distributed RL spaghetti al arabiata
26 stars 7 forks source link

Speed dropouts correlated with strategy collapse #15

Closed TimZaman closed 5 years ago

TimZaman commented 5 years ago

The below particular example didnt fully shit itself, for the simple reason that its learning rate was very modest (1e-5). The rest is typically 1e-4 as shown in below comments.

Example: 15 Jan 23:09

image Related log entry:

2019-01-16 06:55:27,372 INFO     ::step episode=1427
2019-01-16 06:55:37,756 INFO     steps_per_s=559.17, avg_weight_age=4.88, mean_reward=3.36, loss=-0.0141
2019-01-16 06:55:38,705 INFO     ::step episode=1428
2019-01-16 06:55:42,167 INFO     steps_per_s=711.36, avg_weight_age=3.75, mean_reward=0.29, loss=-0.0138
2019-01-16 06:55:43,089 INFO     ::step episode=1429
2019-01-16 06:55:47,309 INFO     steps_per_s=805.36, avg_weight_age=4.94, mean_reward=0.43, loss=-0.0181
2019-01-16 06:56:12,566 INFO     ::step episode=1430
2019-01-16 06:56:20,674 INFO     steps_per_s=241.90, avg_weight_age=5.94, mean_reward=2.61, loss=-0.0661
2019-01-16 06:57:19,174 INFO     ::step episode=1431
2019-01-16 06:57:28,744 INFO     steps_per_s=140.80, avg_weight_age=2.25, mean_reward=3.39, loss=-0.1711
2019-01-16 06:58:26,423 INFO     ::step episode=1432

At the time, pods were being preempted:

NAME                                     READY   STATUS              RESTARTS   AGE
dotaservice-deployment-b8dc7b9dc-2kjlf   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-5lm7s   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-5z2tr   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-6bwzc   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-6k5p9   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-7cdxh   2/2     Running             1          6h
dotaservice-deployment-b8dc7b9dc-87vzs   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-9pnr9   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-cxb8s   2/2     Running             0          2h
dotaservice-deployment-b8dc7b9dc-f7xls   0/2     ContainerCreating   0          12m
dotaservice-deployment-b8dc7b9dc-fmqm5   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-h2h4h   2/2     Running             0          6h
dotaservice-deployment-b8dc7b9dc-h5j29   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-hdcrs   2/2     Running             0          6h
dotaservice-deployment-b8dc7b9dc-hvr7d   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-hwvsz   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-hxbst   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-j22n7   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-lltwf   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-m5j5d   2/2     Running             0          6h
dotaservice-deployment-b8dc7b9dc-m6md5   2/2     Running             1          6h
dotaservice-deployment-b8dc7b9dc-m85nk   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-mzb9p   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-nrdq4   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-rcqhv   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-scc7t   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-smsrw   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-svzc9   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-wdcjw   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-wjwql   0/2     ContainerCreating   0          12m
dotaservice-deployment-b8dc7b9dc-xtnx7   0/2     ContainerCreating   0          6h
dotaservice-deployment-b8dc7b9dc-z9t6r   2/2     Running             0          6h
optimizer-master-0                       1/1     Running             0          6h
pytorch-operator-6f87db67b7-hgsr5        1/1     Running             0          1d
rmq-7c86fd9767-bqjfk                     1/1     Running             0          6h
TimZaman commented 5 years ago

Same story, from 11 Jan

image

TimZaman commented 5 years ago

Also 11 Jan

image

TimZaman commented 5 years ago

21 Dec - seems to have been around long time

image

Nostrademous commented 5 years ago

So, a thought based on some observation - I don't think there is any strategy collapse.

I believe what is happening is that the game is just ending in victory for one side or another early on as agents become more aggressive.

In 1v1 MID mode - the game is over as soon as one hero dies twice. I recently observed in GUI mode a game where the radiant agent just died twice to creep while farming them and thus the game ended and went to the victory screen after only 1min 50sec of dota-time passed.

Protobufs stop arrive after game is ended and the perceived steps_per_sec go down since we are not able to do anything more.

Are you tracking message sent to the console.log? If so, we could determine when game 'end' from there. I do not currently know of a way to determine game win/loss from protobuf b/c AFAIK they stop arriving the frame the game ends.

TimZaman commented 5 years ago

What you're describing is correct, but I am currentl only training for max-dota-time=210 (seconds). So the game isn't over. I think it's a real bug. a big problem indeed is figuring out somehow who won! I don't know of any other way than scraping the logs.. And getting who won is SUPER essential obviously to get that last massive reward..

On Wed, Jan 16, 2019 at 10:36 AM Nostrademous notifications@github.com wrote:

So, a thought based on some observation - I don't think there is any strategy collapse.

I believe what is happening is that the game is just ending in victory for one side or another early on as agents become more aggressive.

In 1v1 MID mode - the game is over as soon as one hero dies twice. I recently observed in GUI mode a game where the radiant agent just died twice to creep while farming them and thus the game ended and went to the victory screen after only 1min 50sec of dota-time passed.

Protobufs stop arrive after game is ended and the perceived steps_per_sec go down since we are not able to do anything more.

Are you tracking message sent to the console.log? If so, we could determine when game 'end' from there. I do not currently know of a way to determine game win/loss from protobuf b/c AFAIK they stop arriving the frame the game ends.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/TimZaman/dotaclient/issues/15#issuecomment-454890502, or mute the thread https://github.com/notifications/unsubscribe-auth/AHXSRJWubNI20bL5ZRJShR3Q_jTsVK6pks5vD3EngaJpZM4aCcRj .

Nostrademous commented 5 years ago

I am training max-dota-time 200 and had games complete before that

TimZaman commented 5 years ago

Aaaaaah! Maybe because it dies 2x? hmmm interesting, i guess you're right.

On Wed, Jan 16, 2019 at 10:49 AM Nostrademous notifications@github.com wrote:

I am training max-dota-time 200 and had games complete before that

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/TimZaman/dotaclient/issues/15#issuecomment-454894934, or mute the thread https://github.com/notifications/unsubscribe-auth/AHXSRPryKzUIRCtXLI4m1doIKq0XUdDAks5vD3QkgaJpZM4aCcRj .

Nostrademous commented 5 years ago

Yes, it was b/c the Radiant agent died twice (from tower damage which it doesn't currently even know about).

TimZaman commented 5 years ago

Yep @Nostrademous was right, this is just proper strategy collapse, not a bug. Suicidal tendencies with too high lr.

TimZaman commented 5 years ago

Nope, this is actually a thing. Example: huge reward zero-out in weight version 3974 of 'job1', with an average age of 5, so that puts the problem around v3969. Then I find this:

kubectl logs job1-ppo-dotaservice-5844c6ff75-h5tm7 agent
2019-01-23 00:05:56,211 INFO     main(rmq_host=job1-ppo-rmq.default.svc.cluster.local, rmq_port=5672)
2019-01-23 00:05:56,398 INFO     setup_model_cb(host=job1-ppo-rmq.default.svc.cluster.local, port=5672)
2019-01-23 00:05:56,455 INFO     Received new model: version=3968, size=1207838b
2019-01-23 00:05:56,457 INFO     === Starting Episode 0.
2019-01-23 00:05:56,457 INFO     Starting game.
2019-01-23 00:05:56,462 INFO     Player 0 using weights version 3968
2019-01-23 00:05:56,468 INFO     Player 5 using weights version 3968
2019-01-23 00:06:20,245 INFO     Received new model: version=3969, size=1207838b
2019-01-23 00:06:53,885 INFO     Received new model: version=3970, size=1207838b
2019-01-23 00:07:40,671 INFO     Received new model: version=3971, size=1207838b
2019-01-23 00:07:40,953 INFO     Player 0 rollout.
2019-01-23 00:07:40,961 INFO     Player 0 reward sum: -1.24 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'dist': -1.2368338644038648,
 'hp': 0.0,
 'kills': 0.0,
 'lh': 0.0,
 'win': 0.0,
 'xp': 0.0}
2019-01-23 00:07:41,035 INFO     Player 5 rollout.
2019-01-23 00:07:41,045 INFO     Player 5 reward sum: -1.22 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'dist': -1.2227683480761358,
 'hp': 0.0,
 'kills': 0.0,
 'lh': 0.0,
 'win': 0.0,
 'xp': 0.0}
2019-01-23 00:07:41,082 INFO     Game finished.
2019-01-23 00:07:41,307 INFO     === Starting Episode 1.
2019-01-23 00:07:41,307 INFO     Starting game.
2019-01-23 00:07:41,317 INFO     Player 0 using weights version 3971
2019-01-23 00:07:41,325 INFO     Player 5 using weights version 3971
2019-01-23 00:07:58,798 INFO     Received new model: version=3972, size=1207838b
2019-01-23 00:08:15,989 INFO     Received new model: version=3973, size=1207838b
2019-01-23 00:08:48,953 INFO     Received new model: version=3974, size=1207838b
2019-01-23 00:09:10,295 INFO     Player 0 rollout.
2019-01-23 00:09:10,301 INFO     Player 0 reward sum: 7.56 subrewards:
{'death': -0.0,
 'denies': 3.0000000000000004,
 'dist': -0.37351986649889934,
 'hp': 0.0,
 'kills': 0.0,
 'lh': 0.0,
 'win': 0.0,
 'xp': 4.937999999999997}

Clearly, you can see the rewards are practically zero! I think initial weights are not loaded in correctly.

Different pod, same shit:

kubectl logs job1-ppo-dotaservice-5844c6ff75-wpgtx agent
2019-01-23 00:05:57,575 INFO     main(rmq_host=job1-ppo-rmq.default.svc.cluster.local, rmq_port=5672)
2019-01-23 00:05:57,697 INFO     setup_model_cb(host=job1-ppo-rmq.default.svc.cluster.local, port=5672)
2019-01-23 00:05:57,718 INFO     Received new model: version=3968, size=1207838b
2019-01-23 00:05:57,721 INFO     === Starting Episode 0.
2019-01-23 00:05:57,721 INFO     Starting game.
2019-01-23 00:05:57,728 INFO     Player 0 using weights version 3968
2019-01-23 00:05:57,734 INFO     Player 5 using weights version 3968
2019-01-23 00:06:20,293 INFO     Received new model: version=3969, size=1207838b
2019-01-23 00:06:53,952 INFO     Received new model: version=3970, size=1207838b
2019-01-23 00:07:39,035 INFO     Player 0 rollout.
2019-01-23 00:07:39,041 INFO     Player 0 reward sum: -1.17 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'dist': -1.1730716805008694,
 'hp': 0.0,
 'kills': 0.0,
 'lh': 0.0,
 'win': 0.0,
 'xp': 0.0}
2019-01-23 00:07:39,118 INFO     Player 5 rollout.
2019-01-23 00:07:39,129 INFO     Player 5 reward sum: -1.20 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'dist': -1.2018850539050752,
 'hp': 0.0,
 'kills': 0.0,
 'lh': 0.0,
 'win': 0.0,
 'xp': 0.0}
2019-01-23 00:07:39,167 INFO     Game finished.
2019-01-23 00:07:39,412 INFO     === Starting Episode 1.
2019-01-23 00:07:39,413 INFO     Starting game.
2019-01-23 00:07:39,422 INFO     Player 0 using weights version 3970
2019-01-23 00:07:39,431 INFO     Player 5 using weights version 3970
2019-01-23 00:07:40,707 INFO     Received new model: version=3971, size=1207838b
2019-01-23 00:07:58,845 INFO     Received new model: version=3972, size=1207838b
2019-01-23 00:08:16,045 INFO     Received new model: version=3973, size=1207838b
2019-01-23 00:08:49,017 INFO     Received new model: version=3974, size=1207838b
2019-01-23 00:09:07,130 INFO     Player 0 rollout.
2019-01-23 00:09:07,137 INFO     Player 0 reward sum: 5.93 subrewards:
{'death': -0.5,
 'denies': 4.200000000000001,
 'dist': -0.38197812335810066,
 'hp': -1.3292786512672032,
 'kills': 0.0,
 'lh': 0.0,
 'win': 0.0,
 'xp': 3.9439999999999995}
TimZaman commented 5 years ago

Seems to happen with any fresh game;

kubectl logs job2-ppo-dotaservice-66d6f846f6-5lfnt agent
2019-01-22 20:03:28,893 INFO     main(rmq_host=job2-ppo-rmq.default.svc.cluster.local, rmq_port=5672)
2019-01-22 20:03:29,036 INFO     setup_model_cb(host=job2-ppo-rmq.default.svc.cluster.local, port=5672)
2019-01-22 20:03:29,068 INFO     Received new model: version=0, size=1207838b
2019-01-22 20:03:29,071 INFO     === Starting Episode 0.
2019-01-22 20:03:29,071 INFO     Starting game.
2019-01-22 20:03:29,076 INFO     Player 0 using weights version 0
2019-01-22 20:03:29,081 INFO     Player 5 using weights version 0
2019-01-22 20:05:44,622 INFO     Received new model: version=0, size=1207838b
2019-01-22 20:05:53,097 INFO     Player 0 rollout.
2019-01-22 20:05:53,103 INFO     Player 0 reward sum: -1.32 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'dist': -1.3185522764557216,
 'hp': 0.0,
 'kills': 0.0,
 'lh': 0.0,
 'win': 0.0,
 'xp': 0.0}
2019-01-22 20:05:53,176 INFO     Player 5 rollout.
2019-01-22 20:05:53,185 INFO     Player 5 reward sum: -1.24 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'dist': -1.2378653959199162,
 'hp': 0.0,
 'kills': 0.0,
 'lh': 0.0,
 'win': 0.0,
 'xp': 0.0}
2019-01-22 20:05:53,225 INFO     Game finished.
2019-01-22 20:05:53,470 INFO     === Starting Episode 1.
2019-01-22 20:05:53,470 INFO     Starting game.
2019-01-22 20:05:53,477 INFO     Player 0 using weights version 0
2019-01-22 20:05:53,481 INFO     Player 5 using weights version 0
2019-01-22 20:06:05,488 INFO     Received new model: version=1, size=1207838b
2019-01-22 20:06:24,004 INFO     Received new model: version=2, size=1207838b
2019-01-22 20:06:43,119 INFO     Received new model: version=3, size=1207838b
2019-01-22 20:07:17,771 INFO     Received new model: version=4, size=1207838b
2019-01-22 20:07:40,604 INFO     Received new model: version=5, size=1207838b
2019-01-22 20:08:00,148 INFO     Received new model: version=6, size=1207838b
2019-01-22 20:08:02,163 INFO     Player 0 rollout.
2019-01-22 20:08:02,169 INFO     Player 0 reward sum: -0.88 subrewards:
{'death': -0.5,
 'denies': 0.0,
 'dist': -0.8198992464234711,
 'hp': -2.6623403442404907,
 'kills': 0.0,
 'lh': 0.0,
 'win': -1.0,
 'xp': 4.105999999999999}
TimZaman commented 5 years ago
TimZaman commented 5 years ago

Odd, doesn't always seem to happen:

kubectl logs job2-ppo-dotaservice-66d6f846f6-q4n46 agent
2019-01-22 20:08:44,061 INFO     main(rmq_host=job2-ppo-rmq.default.svc.cluster.local, rmq_port=5672)
2019-01-22 20:08:44,090 INFO     setup_model_cb(host=job2-ppo-rmq.default.svc.cluster.local, port=5672)
2019-01-22 20:08:44,131 INFO     Received new model: version=8, size=1207838b
2019-01-22 20:08:44,135 INFO     === Starting Episode 0.
2019-01-22 20:08:44,135 INFO     Starting game.
2019-01-22 20:08:44,143 INFO     Player 0 using weights version 8
2019-01-22 20:08:44,151 INFO     Player 5 using weights version 8
2019-01-22 20:09:18,177 INFO     Received new model: version=9, size=1207838b
2019-01-22 20:09:38,994 INFO     Received new model: version=10, size=1207838b
2019-01-22 20:09:58,277 INFO     Received new model: version=11, size=1207838b
2019-01-22 20:10:29,431 INFO     Received new model: version=12, size=1207838b
2019-01-22 20:10:51,922 INFO     Player 0 rollout.
2019-01-22 20:10:51,928 INFO     Player 0 reward sum: 2.75 subrewards:
{'death': -0.5,
 'denies': 0.2,
 'dist': -0.6888736726595356,
 'hp': -1.2982225214865093,
 'kills': 0.0,
 'lh': 0.0,
 'win': 0.0,
 'xp': 5.033999999999999}
2019-01-22 20:10:51,988 INFO     Player 5 rollout.
2019-01-22 20:10:51,996 INFO     Player 5 reward sum: 3.98 subrewards:
{'death': -0.5,
 'denies': 0.0,
 'dist': -0.609194125515568,
 'hp': -1.2427575685782264,
 'kills': 0.0,
 'lh': 0.0,
 'win': 0.0,
 'xp': 6.331999999999997}
TimZaman commented 5 years ago

Time of the task look good too. Actually the dist rewards are pretty divergent: seems anywhere between -1.2 and -1.6. And dist is consistently the only penalty that's being given. The game lasts around 2m20s, which seems typical.

Nostrademous commented 5 years ago

Is it possible that in some cases at start of game the bot moves into a "corner" and basically traps itself?

I mean, for movement we are using Action_MoveDirectly instead of Action_MoveToPosition. MoveDirectly (if I recall the low-level details correctly) will attempt to make the movement to the location you specified if it can but not diverge from a straight-line path. MoveToPosition will move to specified location by creating a waypoint navigation path (providing the destination location is reachable) and iterating over all the waypoints.

Now, MoveToPosition is "bad" for agents b/c t also paths around visible enemies if it believes they are too dangerous (calculated by the back-end server).

What I'm getting at though, is that I believe it is possible that the bots might make a move to a tight corner in the Fountain and get effectively trapped. I say effectively b/c there is a way out, but it is a very specific move directions that it might not be able to figure out.

Just musing at this point though.