Flattening of state space

LunarEngineer / MentalGymnastics

This is a school project with potential.

MIT License

1 stars 0 forks source link

Since we're settling on Stable Baseline 3 as our agent framework we should formalize a decision; Stable Baselines recommends the state space to be flattened, symmetric, and normalized. My thoughts are below and I would like some group input on what you all think.

Flattened: No big deal; we've already got a homogeneous array and it can be flattened pretty easily. Flattening the array won't change the difficulty of concatenating action information (the palette of available functions.)
Symmetrize and Normalize: This requires the array values to be bounded on the upper and lower ends (i.e. $S_i \in (-a,a)$); I don't think this is a good idea. We're representing a discrete space with the ID's.

One idea:

When we were doing our own custom agent, we conceptually made this easier by breaking up the action into its constituent components: function IDs had their own net, location had its own net, radius had its own net. I think this takes care of flattening and symmetrize/normalization. We can bound the locations and radii by something reasonable like (0,100) or (0, 100 * sqrt(2)) and the function IDs component is just a finite-sized discrete one-hot vector, which should be no problem to handle.

So in other words, one solution is to have 3 A2C agents (or 2 A2C and 1 DQN if A2C doesn't like the discrete action space). The tradeoff here is that each new action component will depend on the current state (all actions dropped thus far) but will be calculated independently, unless we cascade the output of one into the input of another like we talked about.

LunarEngineer / MentalGymnastics

Flattening of state space #8