IntelLabs / coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
https://intellabs.github.io/coach/
Apache License 2.0
2.33k stars 461 forks source link

TimeDistributed LSTM Middleware #461

Open OGordon100 opened 4 years ago

OGordon100 commented 4 years ago

For many real-world situations, the task may have hidden state or partially observable features, making the Markovian assumption only semi-valid.

One way around this is to use frame stacking - doable already in Coach with filters.observation.observation_stacking_filter. It may be even better to use LSTM (and bi-directional) LSTM. Agents for this already exist, with the very well cited DRQN being one of them.

In Coach currently, there is the LSTMMiddleware layer. However, from what I understand of the source code it runs along the observations axis (for inputs such as text). Tensorflow of course has the TimeDistributed wrapper (with return_sequences=True) to run LSTM along the temporal axis between transitions.

Could timedistributed LSTM be added as a middleware? (or at the very least "hacked" in, as it would be of immense benefit to my current research, which I am using with a simple behavioural cloning agent)