Jimenius / InternRL

0 stars 0 forks source link

Learning Evaluation #9

Open Jimenius opened 5 years ago

Jimenius commented 5 years ago

Use Average Q (V) instead of Average cumulative reward for learning evaluation