LunarEngineer / MentalGymnastics

This is a school project with potential.
MIT License
1 stars 0 forks source link

Flattening of state space #8

Open LunarEngineer opened 3 years ago

LunarEngineer commented 3 years ago

Since we're settling on Stable Baseline 3 as our agent framework we should formalize a decision; Stable Baselines recommends the state space to be flattened, symmetric, and normalized. My thoughts are below and I would like some group input on what you all think.

hagopi1611 commented 3 years ago

One idea:

When we were doing our own custom agent, we conceptually made this easier by breaking up the action into its constituent components: function IDs had their own net, location had its own net, radius had its own net. I think this takes care of flattening and symmetrize/normalization. We can bound the locations and radii by something reasonable like (0,100) or (0, 100 * sqrt(2)) and the function IDs component is just a finite-sized discrete one-hot vector, which should be no problem to handle.

So in other words, one solution is to have 3 A2C agents (or 2 A2C and 1 DQN if A2C doesn't like the discrete action space). The tradeoff here is that each new action component will depend on the current state (all actions dropped thus far) but will be calculated independently, unless we cascade the output of one into the input of another like we talked about.