For many real-world situations, the task may have hidden state or partially observable features, making the Markovian assumption only semi-valid.
One way around this is to use frame stacking - doable already in Coach with filters.observation.observation_stacking_filter. It may be even better to use LSTM (and bi-directional) LSTM. Agents for this already exist, with the very well cited DRQN being one of them.
In Coach currently, there is the LSTMMiddleware layer. However, from what I understand of the source code it runs along the observations axis (for inputs such as text). Tensorflow of course has the TimeDistributed wrapper (with return_sequences=True) to run LSTM along the temporal axis between transitions.
Could timedistributed LSTM be added as a middleware? (or at the very least "hacked" in, as it would be of immense benefit to my current research, which I am using with a simple behavioural cloning agent)
For many real-world situations, the task may have hidden state or partially observable features, making the Markovian assumption only semi-valid.
One way around this is to use frame stacking - doable already in Coach with
filters.observation.observation_stacking_filter
. It may be even better to use LSTM (and bi-directional) LSTM. Agents for this already exist, with the very well cited DRQN being one of them.In Coach currently, there is the
LSTMMiddleware
layer. However, from what I understand of the source code it runs along the observations axis (for inputs such as text). Tensorflow of course has theTimeDistributed
wrapper (withreturn_sequences=True
) to run LSTM along the temporal axis between transitions.Could timedistributed LSTM be added as a middleware? (or at the very least "hacked" in, as it would be of immense benefit to my current research, which I am using with a simple behavioural cloning agent)