TimZaman / dotaclient

distributed RL spaghetti al arabiata
26 stars 7 forks source link

Removing Invalid Actions #14

Closed Nostrademous closed 5 years ago

Nostrademous commented 5 years ago

One suggestions that seems reasonable to me is: https://ai.stackexchange.com/a/2994

This is regarding: https://github.com/TimZaman/dotaclient/blob/master/policy.py#L127-L133

TimZaman commented 5 years ago

Yeah i should add the valid-action-mask itself to the forward function, because i should run the softmax again over the invalid actions.

TimZaman commented 5 years ago

I did this now. thanks

Nostrademous commented 5 years ago

I feel like the bot is learning much faster thanks to this. This is based on purely watching the bot and its decision making.

What do you think?

TimZaman commented 5 years ago

Yep very true

On Fri, Feb 1, 2019, 06:16 Nostrademous <notifications@github.com wrote:

I feel like the bot is learning much faster thanks to this. This is based on purely watching the bot and its decision making.

What do you think?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/TimZaman/dotaclient/issues/14#issuecomment-459736235, or mute the thread https://github.com/notifications/unsubscribe-auth/AHXSRD1ZI2sGJidkKtFNLNMiNf-_7ZyKks5vJEw2gaJpZM4aBi98 .

Nostrademous commented 5 years ago

I thought you were going to increase the reward values for Win/Loss? I still see kills/deaths as more impactful in current version.

TimZaman commented 5 years ago

Doesnt matter at the moment, it doesnt explore or train well. It gets to be a last hit master but there is no smart play going on.

On Fri, Feb 1, 2019, 09:38 Nostrademous <notifications@github.com wrote:

I thought you were going to increase the reward values for Win/Loss? I still see kills/deaths as more impactful in current version.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/TimZaman/dotaclient/issues/14#issuecomment-459803900, or mute the thread https://github.com/notifications/unsubscribe-auth/AHXSRIzljQ8NGxXswIYhQ5N-tkFu48Nkks5vJHt6gaJpZM4aBi98 .

Nostrademous commented 5 years ago

So I think I'm going to work on fixing that by doing several things:

1) Teach it about lane fronts and reward it by being near it (rather than at location (0,0,0) <--- center of map 2) Change the Tower HP reward to be a delta between the Enemy Tower HP and Friendly Tower HP (as in "it's okay if our tower is taking damage providing we are doing more damage to enemy tower")

It should help.

Separate from that, I'm working on formalizing a new ML approach to how to teach agents to do strategic planning versus tactical actions. I hopefully can share with folks outside my company.

TimZaman commented 5 years ago

Actually the distance reward can be removed. I just sent you an email, ill be at OpenAI tomorrow. Any questions we have for them?

On Fri, Feb 1, 2019, 10:19 Nostrademous <notifications@github.com wrote:

So I think I'm going to work on fixing that by doing several things:

  1. Teach it about lane fronts and reward it by being near it (rather than at location (0,0,0) <--- center of map
  2. Change the Tower HP reward to be a delta between the Enemy Tower HP and Friendly Tower HP (as in "it's okay if our tower is taking damage providing we are doing more damage to enemy tower")

It should help.

Separate from that, I'm working on formalizing a new ML approach to how to teach agents to do strategic planning versus tactical actions. I hopefully can share with folks outside my company.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/TimZaman/dotaclient/issues/14#issuecomment-459817907, or mute the thread https://github.com/notifications/unsubscribe-auth/AHXSRJ_-4cUG2y_Osbxiisgphhs6Z-l-ks5vJIVHgaJpZM4aBi98 .

Nostrademous commented 5 years ago

Might be easier to chat briefly. Jump into a Google Hangouts if you can.

I'll be sitting there for the next 20-30min.

LINK DOWN

Nostrademous commented 5 years ago

It was sad and lonely.... replied to you email.

Nostrademous commented 5 years ago

So looking at code - you don't disable all "invalid" actions.

The code only prevents you attacking yourself, or issuing an attack action all together if there are no valid unit_handles, it does not prevent attacking your own units if they are at full health (which has no effect, although technically valid).

I'm not saying this is a problem that needs fixing, just pointing it out. Attacking your own units at full health is a way to drop tower aggro for example.

TimZaman commented 5 years ago

Sure, i knew that. In principle its just good to prevent. I do need to check what is going on when the hero is dead.

On Mon, Feb 4, 2019, 07:47 Nostrademous <notifications@github.com wrote:

So looking at code - you don't disable all "invalid" actions.

The code only prevents you attacking yourself, or issuing an attack action all together if there are no valid unit_handles, it does not prevent attacking your own units if they are at full health (which has no effect, although technically valid).

I'm not saying this is a problem that needs fixing, just pointing it out. Attacking your own units at full health is a way to drop tower aggro for example.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/TimZaman/dotaclient/issues/14#issuecomment-460297947, or mute the thread https://github.com/notifications/unsubscribe-auth/AHXSRKsUNp1ddnkpnQ2JlM2NWj-Vgv83ks5vKFYWgaJpZM4aBi98 .