should be:
self._last_delta = max(self._last_delta, abs(td_delta))
Without abs(), all negative deltas look like convergence. I have a case that approaches true Q from above, so all td_delta's are negative, and it thinks things are converged from iteration 0 even though delta Q's are in the 10's or 100's.
Sorry, this should be a PR but I don't have a clean forked version handy to work from and its a really simple issue so...
In the test for convergence of q_learning, I think this line: https://github.com/cmaron/CS-7641-assignments/blob/5a4c511332f0ee764cc768b6bc76a1422a98244c/assignment4/solvers/q_learning.py#L69
should be:
self._last_delta = max(self._last_delta, abs(td_delta))
Without abs(), all negative deltas look like convergence. I have a case that approaches true Q from above, so all td_delta's are negative, and it thinks things are converged from iteration 0 even though delta Q's are in the 10's or 100's.
Thanks for the great code to work with!