TimZaman / dotaclient

distributed RL spaghetti al arabiata
26 stars 7 forks source link

Calculating Reward on Game End #38

Closed Nostrademous closed 5 years ago

Nostrademous commented 5 years ago

If you look at the stream of rewards below (entire Game #2) you will see that it ends in victory for Dire, however you only see 1 death each from both agents. Also, based on tower_hp it looks like the tower was not even close to dying, meaning the game ended b/c the Radiant agent died a 2nd time, but I don't have the -3.0 kill reward for Player 0 in the reward a second time.

This make me believe that we don't capture the rewards between the last reward sync and the game end.

2019-02-05 10:59:50,789 INFO     === Starting Game 2.
2019-02-05 10:59:50,789 INFO     Starting game.
2019-02-05 10:59:50,797 INFO     Player 0 using weights version 0
2019-02-05 10:59:50,802 INFO     Player 5 using weights version 0
2019-02-05 11:00:16,411 INFO     Player 0 rollout.
2019-02-05 11:00:16,412 INFO     Player 0 reward sum: -0.11 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': -0.114,
 'hp': 0.0,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': 0.0,
 'win': 0.0,
 'xp': 0.0}
2019-02-05 11:00:16,429 INFO     Player 5 rollout.
2019-02-05 11:00:16,430 INFO     Player 5 reward sum: 0.11 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': -0.0,
 'hp': 0.0,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': 0.0,
 'win': 0.0,
 'xp': 0.114}
2019-02-05 11:00:33,551 INFO     Received new model: version=0, size=1472372b
2019-02-05 11:00:40,146 INFO     Player 0 rollout.
2019-02-05 11:00:40,147 INFO     Player 0 reward sum: -0.15 subrewards:
{'death': -3.0,
 'denies': 0.0,
 'enemy': 3.0988716954415696,
 'hp': -1.2002411301619431,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': -0.015,
 'win': 0.0,
 'xp': 0.9700000000000001}
2019-02-05 11:00:40,158 INFO     Player 5 rollout.
2019-02-05 11:00:40,159 INFO     Player 5 reward sum: 0.15 subrewards:
{'death': -3.0,
 'denies': 0.2,
 'enemy': 3.245241130161943,
 'hp': -1.2005383621082364,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': -0.058333333333333334,
 'win': 0.0,
 'xp': 0.96}
2019-02-05 11:00:56,220 INFO     Player 0 rollout.
2019-02-05 11:00:56,221 INFO     Player 0 reward sum: -6.98 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': -0.61683011154303,
 'hp': -1.3583716176202625,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': 0.0,
 'win': -5.0,
 'xp': 0.0}
2019-02-05 11:00:56,226 INFO     Player 5 rollout.
2019-02-05 11:00:56,227 INFO     Player 5 reward sum: 6.72 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': 1.3602503028054476,
 'hp': -0.3734285740740741,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': -0.018333333333333333,
 'win': 5.0,
 'xp': 0.756}
2019-02-05 11:00:56,232 INFO     Game finished.
TimZaman commented 5 years ago

Both players died here, during the same rollout. Then one player won at the last rollout. The rewards are aggregated only per-rollout. It doesn't matter how he won (death or tower), a win might be because of a death or tower, but that's not scored independently - no need to.

Nostrademous commented 5 years ago

But a single player needs to die twice for game to be over. They each died once according to record, so game should not be over. What I believe happened is that a player died a 2nd time and that this info was not captured in our reward aggregation.

TimZaman commented 5 years ago

it doesn't need to be captured, because it got rewarded for the win.

On Wed, Feb 6, 2019 at 5:21 PM Nostrademous notifications@github.com wrote:

But a single player needs to die twice for game to be over. They each died once according to record, so game should not be over. What I believe happened is that a player died a 2nd time and that this info was not captured in our reward aggregation.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/TimZaman/dotaclient/issues/38#issuecomment-461254623, or mute the thread https://github.com/notifications/unsubscribe-auth/AHXSRGPiACfHUiROkV4XHGoBonfDG8unks5vK3-0gaJpZM4ajiU0 .

Nostrademous commented 5 years ago

but doesn't the bot that died a 2nd time not get the negative reward from the 2nd death? Sure, it "loses" and gets the -5 but it won't necessarily make the connection that 2nd death is the cause as it doesn't see the 2nd death in the rewards.

or am I misunderstanding something about our algo?

TimZaman commented 5 years ago

The reward is a single scalar of the sum of all rewards. It doesn't know how it's compounded.

On Wed, Feb 6, 2019 at 6:39 PM Nostrademous notifications@github.com wrote:

but doesn't the bot that died a 2nd time not get the negative reward from the 2nd death? Sure, it "loses" and gets the -5 but it won't necessarily make the connection that 2nd death is the cause as it doesn't see the 2nd death in the rewards.

or am I misunderstanding something about our algo?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/TimZaman/dotaclient/issues/38#issuecomment-461268530, or mute the thread https://github.com/notifications/unsubscribe-auth/AHXSRCBGsPqAHGPllLHz832VEu_-V5Q7ks5vK5HjgaJpZM4ajiU0 .