FELICES-David / DQN_Cartpole

0 stars 0 forks source link

Feedback 12/6 #1

Open takuseno opened 4 years ago

takuseno commented 4 years ago

Use relu function instead. https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L62

You don't need type casting to float64. This will require an extra computational cost. Basically, in most of the cases in deep learning, float32 is enough. https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L67

You better use a plain replay buffer. https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L108

I think you better start from a constant epsilon value. As this cartpole is easy, you can fix the epsilon with values between 0.1 and 0.3. https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L155

Is this correct...? Why not argmax? https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L159

Don't iterate a batch. Do batch update. https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L188

takuseno commented 4 years ago

https://discuss.pytorch.org/t/how-to-clamp-tensor-to-some-range-without-doing-an-inplace-operation/18261

It seems that clamp is not a differentiable function. Use relu instead.