Are you sure Deep recurrent notebook is correct?

awjuliani / DeepRL-Agents

A set of Deep Reinforcement Learning Agents implemented in Tensorflow.

MIT License

2.24k stars 826 forks source link

Are you sure Deep recurrent notebook is correct? #58

Open Joshuaalbert opened 6 years ago

Joshuaalbert commented 6 years ago

In the notebook I don't see where your recurrent Q value model gets its trace dimension. You're just reshaping the output of a convnet and feeding this directly into an LSTM. Furthermore, should you not also provide the non-zero initial state determined at play time? I.e. the internal state should be stored in the experience buffer and used during training. Corrent me if I'm wrong please.

geonyeong-park commented 5 years ago

Yap i thought in similar way but turned that the code seems work correctly.

Reshaping issue Here batch_size, trace_length are set to 4,8. Each Qnetwork object(main, target) receives batchtrace=32 frames. After conv4, dimension are turned into (32, 1, 1, 512) = (batchtrace, w, h, hidden units).
Non-zero H0 is iteratively updated and given to feed_dict[network.state]. This state is 'last hidden state' returned by each LSTM forward passing.

Michaeljurado42 commented 5 years ago

I had another thought. Isn't it unnecessary to have a target network for this notebook in the first place? Since you are setting the target network to be equal to the mainDQN right before training?