Chulabhaya / recurrent-discrete-sac

Recurrent discrete Soft Actor-Critic implementation for solving POMDPs
MIT License
2 stars 0 forks source link

I would like to ask that the implementation of the RSAC algorithm comes from that literature #2

Open gorkr opened 7 months ago

gorkr commented 7 months ago

Hello, dear author, your code has helped me a lot, I am very grateful. I noticed comments in the code, such as "# calculate eq." 7 in updated SAC paper ", I want to know which paper you refer to implemente the code I'm also confused about the use of "mask" in the code.

Chulabhaya commented 7 months ago

Hey there @gorkr! Happy to hear you've found my code helpful. To answer your questions, the paper the comments refer to is the following: https://arxiv.org/abs/1812.05905

The mask is used in the recurrent versions of the algorithm in order to handle batches of data where the batches contain sequences of varying length. All the sequences are padded with zeros to the same length, so the mask is needed to know what parts of the data to ignore when doing backpropagation.