NVlabs / GA3C

Hybrid CPU/GPU implementation of the A3C algorithm for deep reinforcement learning.
BSD 3-Clause "New" or "Revised" License
649 stars 195 forks source link

Meaning of the RScore #45

Closed YukeWang96 closed 2 years ago

YukeWang96 commented 2 years ago

Hi,

What is the meaning of the minus Rscore from the output I get? Should I apply abs(Rscore) to get the actual reward? and how do I know when it is near to converge?

[Time:      943] [Episode:     1046 Score:   -20.0000] [RScore:   -20.3550 RPPS:  1255] [PPS:  1228 TPS:   208] [NT:  2 NP:  2 NA: 33]
[Time:      943] [Episode:     1047 Score:   -20.0000] [RScore:   -20.3550 RPPS:  1256] [PPS:  1229 TPS:   208] [NT:  2 NP:  2 NA: 33]
[Time:      943] [Episode:     1048 Score:   -19.0000] [RScore:   -20.3530 RPPS:  1259] [PPS:  1230 TPS:   208] [NT:  2 NP:  2 NA: 33]
[Time:      944] [Episode:     1049 Score:   -21.0000] [RScore:   -20.3530 RPPS:  1259] [PPS:  1231 TPS:   208] [NT:  2 NP:  2 NA: 33]
[Time:      944] [Episode:     1050 Score:   -19.0000] [RScore:   -20.3520 RPPS:  1258] [PPS:  1231 TPS:   208] [NT:  2 NP:  2 NA: 33]
[Time:      944] [Episode:     1051 Score:   -20.0000] [RScore:   -20.3530 RPPS:  1259] [PPS:  1232 TPS:   208] [NT:  2 NP:  2 NA: 33]
[Time:      945] [Episode:     1052 Score:   -20.0000] [RScore:   -20.3530 RPPS:  1259] [PPS:  1233 TPS:   208] [NT:  2 NP:  2 NA: 33]
[Time:      947] [Episode:     1053 Score:   -20.0000] [RScore:   -20.3530 RPPS:  1257] [PPS:  1231 TPS:   208] [NT:  2 NP:  2 NA: 33]
[Time:      947] [Episode:     1054 Score:   -20.0000] [RScore:   -20.3530 RPPS:  1257] [PPS:  1232 TPS:   208] [NT:  2 NP:  2 NA: 33]

Thanks!