Closed chengcheng8632 closed 4 years ago
Hi @chengcheng8632, this may indeed not be entirely clear from Appendix A.1 in the paper, but it is mentioned in the "Model" paragraph of Section 4: "The extracted features and the reward are fed into a single-layer LSTM"
Edit: regarding the purpose, it is an extra signal that a recurrent model may take advantage of -- I am not sure if experiments have been run to see whether or not this improved performance in this particular application
Dear author: I'm sorry to disturb you again. In the paper, our input is a 21-dimensional state space, and the value of reward is not used as an input. But in polybeast.py in the train folder, "core_input = torch.cat ([x, clipped_reward], dim = -1)", it seems that the reward is used as input. I ask is that right? What is the purpose of this? Looking forward to your answer. Thank you very much?