TimZaman / dotaclient

distributed RL spaghetti al arabiata
26 stars 7 forks source link

Attacking Towers #33

Closed Nostrademous closed 5 years ago

Nostrademous commented 5 years ago

This is a start to improve #32

We can now attack towers.

This also compares tower health between the friendly and enemy mid tier 1 tower as a reward.

Also adds more reward to a win/loss.

Minor fixes here and there.

Nostrademous commented 5 years ago

rebased on top of your latest code

Nostrademous commented 5 years ago

Added a static method to break-apart the full unit-list provided in protobuffer into unit-type lists in a single pass so that unit_matrix() static method doesn't have to iterate the full list numerous times as for-loops in python tend to be not the most efficient.

Nostrademous commented 5 years ago

Since we now use zero-sum game and enemy rewards are reflected inversely in our rewards I fixed tower health reward to not be a zero-sum game in itself as we were thus doubling the effect. Also, I believe I had the reward value reversed by accident.

Nostrademous commented 5 years ago

Added a parameter for tracking whether a unit "is attacking me" (normalized to [-0.5 to 0.5]) and removed the facing_sin and facing_cos parameters as they do not affect any of the actions we have enabled thus far (if we attack, it will just turn and attack...)

If we bring them back in the future we should quantize them better as they add a lot of state-space search the way they were implemented.

Nostrademous commented 5 years ago

Apparently I always have access to all the buildings/towers of allied & enemies in the unit-list protobuf. Data might not be valid but they are present so I added a way to filter them out.

TimZaman commented 5 years ago

I got some trouble ingesting this. Let me know when you clean it up/RFR

Nostrademous commented 5 years ago

Sure, what is the ingest trouble? Sorry I don't understand if you mean "merging" or "understanding" or ???

TimZaman commented 5 years ago

Code is doing quite a lot, not simple to review

On Mon, Feb 4, 2019, 12:39 Nostrademous <notifications@github.com wrote:

Sure, what is the ingest trouble? Sorry I don't understand if you mean "merging" or "understanding" or ???

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/TimZaman/dotaclient/pull/33#issuecomment-460404135, or mute the thread https://github.com/notifications/unsubscribe-auth/AHXSRJGe3ss_RIr359XjOCAWLdIP2AZ6ks5vKJpzgaJpZM4afJD2 .

Nostrademous commented 5 years ago

Okay, added a few more comments but it still will be a lot I fear. Here is what the code does:

1) Created a new static function to separate the unit list embedded in the world-state protobuf into unit-type specific groups. That way we can add more groups later (like wards, courier, jungle creep, etc.) and not worry about them when performing other calculations. This also "should" speed up execution as all the 'unit_matrix()' calls we were doing now don't have to iterate the entire unit list in the protobuf to determine the appropriate basic unit info tensors, but rather just the ones for the list we are creating (as we have separate embeddings for each).

2) It adds the 'TOWER' unit-type handles and list into the policy for evaluation and reasoning so we have an understanding of their health and other basic unit information. This is not the easiest thing to do though as testing showed that the world-state protobuf includes information about all the friendly and enemy towers (even if I can't see the enemy towers - which I was not expecting... I think it just reports either max value or last seen value, but need to test more) so I had to add code to specifically only attack the tier 1 mid tower for now so it stops trying to target the other towers which it can't hit anyways because they are invulnerable. For friendly tier 1 tower, I added code to prevent it being a valid handle (via not including it in the unit handle list) if the tower is above 10% health.

3) I modified the basic unit understanding parameters by adding 'is_targeting_me' parameter which is a normalized boolean for whether that unit is currently right-clicking me / attacking me. This, IMHO, should help the bots learn that they are being targeted/attacked and by whom. Hopefully this info along with the distance from each unit helps them learn to not die to towers or creep attacks over time. I removed the facing sin/cos parameters as I don't think they add anything (but please let me know if I'm wrong... I can be really dull on some thing at times).

4) Everything else is just minor cleanup. I normalized the tower reward to be [0.0 to 3.0] by changing the divisor from 500. to 600. (since tier 1 tower has 1800 health). I changed win rewards to be 10. and -10. for win/loss. etc.

TimZaman commented 5 years ago

btw did you rebase?

Nostrademous commented 5 years ago

I did 23 hrs ago and you haven’t made any committed since so it is sitting on top of your HEAD

Nostrademous commented 5 years ago

Well... I'm going to have to rebase again and do some merge conflict clean up. Probably in 1-2 hrs when I have time.

Nostrademous commented 5 years ago

@TimZaman Okay, rebased on your latest commits. Hopefully I have explained it enough.

TimZaman commented 5 years ago

Hmm so 90% is great, 5% is up for discussion and 5% i disagree with. I think I'll merge it and then patch up. Or just patch up the MR directly. I guess I'll do that.