Open Guiliang opened 7 years ago
see Gradient Descent Sarsa(λ) in http://classes.engr.oregonstate.edu/mime/fall2008/me539/Lectures/ME539-w6-RL2_notes.pdf, try to implement it with tensorflow
similar to weighted average of last few states
The code seems correct. If it does not converge, please set the alpha to 1e-3 instead of 1e-2. You can verify the code by setting lamda = 0 and run. If the code is correct, the result should be same as TD.
read the code of TD(lambda) here https://github.com/Guiliang/Sport-Analytic-NN/blob/master/td_prediction_eligibility_trace.py, focus on the gradient descent in neural network structure. I