Define explicity the Observation space

Batou1406 commented 2 weeks ago

Define explicitly the Observation space

Context

Now, some history is provided to the observation.space (ie. input of the network). it is queried with the default function mdp.last_action.

This means, the acton.space is always into the observation.space.

Problem

If we change the type of network and the type of action.space. For example :

The expert policy without optimizer will output action='discrete', and time_horizon=1 which will results in an action.space.ndim=32
The student policy with optimizer will output action='spline', and time_horizon=15 which will results in an action.space.ndim=104.

This will then change the observation.space, due to the mdp.last_action being into the observation.space

Solution

Instead of putting blindly, the network output into the observation.space, one could put the actual output applied to the system into the observation space (eventually with an history size > 1). However, the actual output is into some kind of specific frame (in this case world frame) that may not be relevant for the network. From the output of the network a Normalisation $N(x)$ and a Transformation $T(x)$ are applied before having it into relavant space for the low level controller. Inverse transformation and normalisation should be applied to the actual output before beeing feed into the network observation.space.

[x] Define inverse Transformation $T^{-1}(x)$
[x] Define inverse Normalisation $N^{-1}(x)$
[x] Add $N^{-1}(T^{-1}(action_i))$ to the observation.space, with $action_i$ the last discrete action applied to the system.
[ ] (Optional Implement a rolling buffer for variable action history length > 1)

Batou1406 commented 1 week ago

Inverse Transformation and inverse Normalisation implemented and tested

Batou1406 commented 1 week ago

Observation space is now fixed no matter the dimension. Tested and it's working. Closing the task for now

Batou1406 / dls_orbit_bat_private