Open namjiwon1023 opened 10 months ago
Do you aware of any reference implementations? There are couple of ways how it can be done. Problem tht in PPO I am reusing old hidden state from previous step but in SAC you can have very old sequences so you cannot reuse hidden state.
I have referred to some people's work on adding RNNs to reinforcement learning algorithms, but strangely, almost everyone's code implementation is different. So I would like to ask how you integrate LSTM or GRU into the SAC algorithm.
In my implementation, I have incorporated LSTM into both the actor and critic networks. The image below shows the LSTM added to the actor network.
And during training, I initialize the hidden state input of the LSTM.
I also initialize the input hidden state when the environment is reset.
I would like to ask if my method of adding this is correct. How did you incorporate RNN into SAC when you did it?
Thank you, I look forward to your reply.