what does reward_hidden_c mean in mcts.py?

YeWR / EfficientZero

Open-source codebase for EfficientZero, from "Mastering Atari Games with Limited Data" at NeurIPS 2021.

GNU General Public License v3.0

847 stars 131 forks source link

what does reward_hidden_c mean in mcts.py? #11

Closed sekv closed 2 years ago

sekv commented 2 years ago

Hi, In mcts.py code line 35-36, what does reward_hidden_c and reward_hidden_h mean? ( what is c and h short for?) why reward_hidden_c_pool = [reward_hidden_roots[0]] and reward_hidden_h_pool = [reward_hidden_roots[1]]. I find it difficult to understand the code, could you give some comments. Many thanks!

YeWR commented 2 years ago

We use nn.LSTM to handle the prediction of the value prefix. And the input/output of nn.LSTM contains the final hidden states and the final cell states.

For detailed usage, please refer to https://pytorch.org/docs/1.9.1/generated/torch.nn.LSTM.html. But they are tuple. The xxx_pools in mcts.py are simple structured databases for saving the corresponding data. For better indexing, the tuple is split into two arrays, namely c_pool and h_pool. Hope this can help you:)

sekv commented 2 years ago

Got it! Thank you for your detailed reply:)