Open Jimenius opened 5 years ago
Use Average Q (V) instead of Average cumulative reward for learning evaluation
Use Average Q (V) instead of Average cumulative reward for learning evaluation