Open takuseno opened 4 years ago
Use relu function instead. https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L62
relu
You don't need type casting to float64. This will require an extra computational cost. Basically, in most of the cases in deep learning, float32 is enough. https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L67
You better use a plain replay buffer. https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L108
I think you better start from a constant epsilon value. As this cartpole is easy, you can fix the epsilon with values between 0.1 and 0.3. https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L155
Is this correct...? Why not argmax? https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L159
argmax
Don't iterate a batch. Do batch update. https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L188
https://discuss.pytorch.org/t/how-to-clamp-tensor-to-some-range-without-doing-an-inplace-operation/18261
It seems that clamp is not a differentiable function. Use relu instead.
clamp
Use
relu
function instead. https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L62You don't need type casting to float64. This will require an extra computational cost. Basically, in most of the cases in deep learning, float32 is enough. https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L67
You better use a plain replay buffer. https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L108
I think you better start from a constant epsilon value. As this cartpole is easy, you can fix the epsilon with values between 0.1 and 0.3. https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L155
Is this correct...? Why not
argmax
? https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L159Don't iterate a batch. Do batch update. https://github.com/FELICES-David/DQN_Cartpole/blob/5c318fdbf002bc23a199b0782539a45cb3d49c6c/cartpole-DQN_v2.py#L188