Architecture is typically either:

CNN => MLP => LSTM => Value/Policy
CNN => LSTM => MLP => Value/Policy

In most isaacgymenvs, CNN = Identity() There are many cases where LSTM might be helpful, but could hurt training stability. Thus, we add a skip connection around the LSTM with concat_output: True, which creates a skip connection around the LSTM so that it can be bypassed for learning behavior without temporal dependencies

MLP w/ skip connection => LSTM => Value/Policy (already part of RL_games)

    rnn:
      name: lstm
      units: 256
      layers: 1
      before_mlp: False
      concat_input: True
      layer_norm: True

MLP w/ skip connection => LSTM w/ skip connection => Value/Policy (new)

    rnn:
      name: lstm
      units: 256
      layers: 1
      before_mlp: False
      concat_input: True
      layer_norm: True
      concat_output: True

LSTM w/ skip connection => MLP => Value/Policy (new)

    rnn:
      name: lstm
      units: 256
      layers: 1
      before_mlp: True
      layer_norm: True
      concat_output: True

Denys88 / rl_games

2023 10 16 rnn concat output #260

MLP w/ skip connection => LSTM => Value/Policy (already part of RL_games)

MLP w/ skip connection => LSTM w/ skip connection => Value/Policy (new)

LSTM w/ skip connection => MLP => Value/Policy (new)