TimZaman / dotaclient

distributed RL spaghetti al arabiata
28 stars 7 forks source link

Create (or use mine) dotaworld module #9

Open Nostrademous opened 5 years ago

Nostrademous commented 5 years ago

From an architecture standpoint - we really should have a: 1) dotaservice module - already exists, responsible for all the network/file IO between Dota2 and our code 2) dotaagent module (called dotaclient currently) - responsible for the AI portion of controlling the bots - uses NNs, RL, calculates rewards, learning, etc. 3) dotaworld module - responsible for mapping the CMsgBotWorldState protobuf information acquired at whatever interval to tracked entities in the Python world.

I started work on 3 in my code already: https://github.com/pydota2/pydota2/tree/master/dotaworld

Nostrademous commented 5 years ago

updated pydota2 to start populating and updating the WorldData classes from the dotaworld module.

https://github.com/Nostrademous/dotaclient/blob/2ad3daa5e8b3e104ed06d09ba02412d19be1d9fd/agent.py#L473-L476

It still does not use the inherent data yet for decision making. https://github.com/pydota2/pydota2/issues/2

Nostrademous commented 5 years ago

@TimZaman In case you are interested, I'm making strides updating the agent to use the dotaworld class I'm re-creating since it uses its own per world state dump caching and coherency.

Here I re-wrote the reward function to use it. https://github.com/Nostrademous/dotaclient/blob/d3f94b24aab1ba4b690ab0c6220ccd6d427b765f/agent.py#L150=L206

Data gets created here: https://github.com/Nostrademous/dotaclient/blob/d3f94b24aab1ba4b690ab0c6220ccd6d427b765f/agent.py#L518-L521

and updated here: https://github.com/Nostrademous/dotaclient/blob/d3f94b24aab1ba4b690ab0c6220ccd6d427b765f/agent.py#L538

which leads to: https://github.com/pydota2/pydota2/blob/master/dotaworld/world_state.py#L408

which then updates all the heroes, enemies, buildings, creeps, wards, etc...

TimZaman commented 5 years ago

image currently occupied with self-play induced strategy collapse (bots getting both very hostile)

Nostrademous commented 5 years ago

X-axis is time or episodes?

And that’s fine, just wondering when we will sync up.

For rewards I am thinking more and more that XP and gold are the only thing really needed. Gold from last hits and denies and death prevention and kills and assist.

TimZaman commented 5 years ago

Should we make things like 'get_total_xp' and 'xp needed to reach level' part of the dotaservice? These seem super generic. Things like reward functions are really specific to users models.

On Fri, Jan 11, 2019 at 8:54 PM Nostrademous notifications@github.com wrote:

X-axis is time or episodes?

And that’s fine, just wondering when we will sync up.

For rewards I am thinking more and more that XP and gold are the only thing really needed. Gold from last hits and denies and death prevention and kills and assist.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TimZaman/dotaclient/issues/9#issuecomment-453719759, or mute the thread https://github.com/notifications/unsubscribe-auth/AHXSRJZYighGqroME70TucYZJP3QsMnZks5vCWqUgaJpZM4Zz6pC .

Nostrademous commented 5 years ago

Are you "anti" creating a "DotaWorld" class or no?

I ask b/c I can re-package it as a PR into DotaClient and I think there is several strong advantages to code clarity and organization by going that path but want to make sure it doesn't clash with your vision as I have unsuccessfully tried to submit those PRs twice thus far.

The goal of the DotaWorld class would be to track the Dota2 environment and all entities in it, allowing the agent a single point of reference about the world state. Additionally, it would be fairly easy to create historical copies of various parts of previous protobuf world-states to do projections (I actually already keep a current and previous copy of the Player data so you can approximate what changed between two world-states dump. Would be trivial to extend it to all units if necessary). It also by my tests executes faster for categorized unit access b/c of how I managed unit handles in list for it.