chainer / chainerrl

ChainerRL is a deep reinforcement learning library built on top of Chainer.
MIT License
1.16k stars 226 forks source link

What's does the columns in scores.txt mean? #604

Closed Nina-9519 closed 3 years ago

Nina-9519 commented 3 years ago

What's does the 'mean median stdev max min' columns in scores.txt mean? Are these mean, median, stdev, max, min value of accumulated rewards? The model I use is ACER, and the scores.txt is as follows. The result seems strange. Is it not convergent?

scores.txt :

steps   episodes    elapsed mean    median  stdev   max min average_value   average_entropy average_kl
1001    117 931.549245595932    22.994  -5.9    55.24231694357529   150.0   -5.9    8.252166279149575e-07   1.456781721261123   0
2009    237 1898.3997354507446  22.721444444444444  -5.9    56.3967482374611    150.0   -5.9    6.554584241373359e-07   1.994066342730258   0
3007    346 2887.7214193344116  21.324166666666667  -5.9    55.34948312936618   150.0   -5.9    3.1961564482033317e-07  2.188916931153323   0
4005    464 3989.3429551124573  24.103674603174603  -5.9    54.18527517562889   150.0   -5.9    -4.3746340530983853e-07 2.260706097817788   0
5001    577 4999.912457227707   22.4105 -5.9    55.864821145427015  150.0   -5.9    1.992825441326459e-08   2.2871246128133773  0
6000    696 5986.300004005432   32.689412698412696  -5.9    59.9964563376872    150.0   -5.9    1.7967948541907436e-07  2.296894638469008   0
7008    815 7096.754754066467   27.18331746031746   -5.9    58.167852371518755  150.0   -5.9    3.989809599721391e-07   2.300509405591157   0
8000    931 8106.299434661865   24.40085714285714   -5.9    58.024090559571476  150.0   -5.9    1.8623225393274041e-07  2.3018157181673518  0
9000    1048    9242.461366891861   22.966150793650794  -5.9    55.376905888995104  150.0   -5.9    -4.808707413055517e-08  2.3023021890432425  0
10002   1169    10238.713153362274  25.241888888888887  -5.9    54.85749905238189   150.0   -5.9    6.797045823572544e-07   2.302481268402564   0
11006   1289    11238.142277002335  18.66552380952381   -5.9    53.26463658802693   150.0   -5.9    4.60787221556961e-07    2.302547061193912   0
12004   1409    12385.134836912155  18.293333333333333  -5.9    52.35123286505851   150.0   -5.9    3.5959581813798054e-07  2.302571073918868   0
13003   1527    13382.710339069366  24.314190476190475  -5.9    57.79681518810051   150.0   -5.9    7.123611406849152e-07   2.3025799230086963  0
14003   1642    14497.844116210938  19.721595238095237  -5.9    52.090269725980534  150.0   -5.9    2.405257353390858e-07   2.3025831733012208  0
15009   1759    15527.261215925217  16.640428571428572  -5.9    52.40810970538776   150.0   -5.9    2.5949955284029753e-07  2.302584367560976   0
16001   1872    16527.354995012283  15.308873015873015  -5.9    47.55863992776027   150.0   -5.9    4.5101203168739945e-07  2.3025848190372113  0
17000   1987    17524.388580322266  19.563507936507936  -5.9    53.51681486069027   150.0   -5.9    -9.934086021460157e-07  2.302584974261194   0
18009   2102    18508.509554862976  28.22224603174603   -5.9    60.101951575866316  150.0   -5.9    -1.4789098939599878e-07 2.3025850352305675  0
19003   2219    19489.714606046677  20.670055555555557  -5.9    54.129024561430455  150.0   -5.9    -3.138120674793102e-07  2.302585055762449   0
20007   2333    20500.296735048294  21.23859523809524   -5.9    53.43898774359548   150.0   -5.9    -1.041718088281207e-06  2.3025850626617905  0
21002   2443    21490.299568653107  19.988333333333333  -5.9    54.386102192309004  150.0   -5.9    3.315658858616448e-08   2.302585078153528   0
22009   2559    22482.89279651642   17.965174603174603  -5.9    50.04625492499749   150.0   -5.9    -2.0176486169396568e-07 2.3025850746358656  0
23000   2675    23478.354805469513  20.01042857142857   -5.9    52.752841128826816  150.0   -5.9    2.091656010559166e-07   2.3025850721881134  0
24003   2793    24447.62403702736   23.356055555555557  -5.9    56.143708825768556  150.0   -5.9    2.8479301697167965e-07  2.302585075162775   0
25002   2916    25455.910034179688  20.94388888888889   -5.9    54.695452449957614  150.0   -5.9    2.6206007583644357e-07  2.302585073102572   0
26001   3037    26420.782063245773  28.123079365079366  -5.9    59.860082008052714  150.0   -5.9    -1.2096961647991957e-08 2.3025850702852577  0
27001   3150    27392.79161787033   25.020261904761906  -5.9    57.56063031885219   150.0   -5.9    1.7676368269941318e-07  2.3025850708910833  0
28005   3270    28422.325241327286  17.385968253968255  -5.9    50.533298019622244  150.0   -5.9    2.6520755837799027e-07  2.302585067384586   0
29000   3385    29415.868234157562  21.92498412698413   -5.9    53.27320790566971   150.0   -5.9    5.222223810643408e-07   2.302585070271495   0
30003   3501    30515.31321787834   20.653222222222222  -5.9    53.994337925226375  150.0   -5.9    2.556834197320482e-07   2.302585061402968   0
31002   3616    31493.621422290802  22.029746031746033  -5.9    56.759014333474205  150.0   -5.9    -3.76738768276439e-07   2.3025850776329304  0
32004   3732    32491.320425987244  19.627666666666666  -5.9    53.51933538912668   150.0   -5.9    5.61863638213642e-07    2.3025850749386416  0
33004   3848    33512.88587427139   16.888261904761904  -5.9    51.10829806736216   150.0   -5.9    -1.4130920087261352e-07 2.3025850741202207  0
34008   3967    34473.20454478264   15.827666666666666  -5.9    48.6348890313185    150.0   -5.9    3.587256077579446e-07   2.3025850638898056  0
35009   4084    35395.335030794144  27.87406349206349   -5.9    58.03223525217698   150.0   -5.9    2.770934896455839e-07   2.3025850775793035  0
36003   4201    36385.66161322594   29.762761904761906  -5.9    59.66770127852202   150.0   -5.9    -1.3605837356490141e-06 2.3025850744173924  0
37004   4326    37332.86027240753   20.380261904761905  -5.9    53.29296239724883   150.0   -5.9    -1.888996110562725e-07  2.302585072134963   0
38003   4438    38336.06611442566   24.321984126984127  -5.9    54.543635532163044  150.0   -5.9    4.592504979298108e-07   2.302585083386111   0
39000   4564    39391.35005712509   27.084206349206347  -5.9    58.19465279506855   150.0   -5.9    1.8043387102438843e-07  2.3025850779716226  0
40003   4682    40512.56388711929   13.569984126984126  -5.9    47.06888430794522   150.0   -5.9    -7.582019350755482e-07  2.3025850656538807  0
41002   4795    41461.543360471725  20.429079365079364  -5.9    55.250408531181805  150.0   -5.9    -2.948083053555457e-07  2.3025850683870406  0
42006   4914    42439.11411833763   21.411055555555556  -5.9    53.727463840004695  150.0   -5.9    -1.0064515051598567e-07 2.302585082454101   0
43002   5029    43420.86684203148   28.58224603174603   -5.9    57.708612916483574  150.0   -5.9    -2.934947674344872e-07  2.3025850742666782  0
44003   5142    44450.65468072891   27.506785714285712  -5.9    59.0385711983105    150.0   -5.9    1.026939085862694e-07   2.3025850784965565  0
45000   5253    45530.10839176178   20.081190476190475  -5.9    52.833746904880584  150.0   -5.9    4.94464717826299e-07    2.3025850802918013  0
46000   5377    46568.64842247963   17.039817460317458  -5.9    51.51962657386226   150.0   -5.9    4.6958347168308055e-07  2.302585078898646   0
47001   5496    47585.09243297577   18.953666666666667  -5.9    53.78945296401842   150.0   -5.9    -1.7994397944596077e-07 2.3025850853009517  0
48008   5610    48586.09735202789   18.556555555555555  -5.9    53.07313769806573   150.0   -5.9    2.1672644222796102e-08  2.302585084981513   0
49006   5725    49578.29277634621   28.05772222222222   -5.9    59.8585618441298    150.0   -5.9    3.1875332480233847e-07  2.3025850819343936  0
50000   5842    50675.663626909256  26.728150793650794  -5.9    57.33805608130482   150.0   -5.9    4.102789388315405e-07   2.3025850850174123  0
muupan commented 3 years ago

Are these mean, median, stdev, max, min value of accumulated rewards?

Yes, they are based on undiscounted accumulated rewards of evaluation episodes.