Denys88 / rl_games

RL implementations
MIT License
863 stars 146 forks source link

2023 10 16 rnn concat output #260

Closed tylerlum closed 10 months ago

tylerlum commented 11 months ago

Architecture is typically either:

  1. CNN => MLP => LSTM => Value/Policy
  2. CNN => LSTM => MLP => Value/Policy

In most isaacgymenvs, CNN = Identity() There are many cases where LSTM might be helpful, but could hurt training stability. Thus, we add a skip connection around the LSTM with concat_output: True, which creates a skip connection around the LSTM so that it can be bypassed for learning behavior without temporal dependencies

MLP w/ skip connection => LSTM => Value/Policy (already part of RL_games)

    rnn:
      name: lstm
      units: 256
      layers: 1
      before_mlp: False
      concat_input: True
      layer_norm: True

MLP w/ skip connection => LSTM w/ skip connection => Value/Policy (new)

    rnn:
      name: lstm
      units: 256
      layers: 1
      before_mlp: False
      concat_input: True
      layer_norm: True
      concat_output: True

LSTM w/ skip connection => MLP => Value/Policy (new)

    rnn:
      name: lstm
      units: 256
      layers: 1
      before_mlp: True
      layer_norm: True
      concat_output: True