Closed Nina-9519 closed 3 years ago
What's does the 'mean median stdev max min' columns in scores.txt mean? Are these mean, median, stdev, max, min value of accumulated rewards? The model I use is ACER, and the scores.txt is as follows. The result seems strange. Is it not convergent?
scores.txt :
steps episodes elapsed mean median stdev max min average_value average_entropy average_kl 1001 117 931.549245595932 22.994 -5.9 55.24231694357529 150.0 -5.9 8.252166279149575e-07 1.456781721261123 0 2009 237 1898.3997354507446 22.721444444444444 -5.9 56.3967482374611 150.0 -5.9 6.554584241373359e-07 1.994066342730258 0 3007 346 2887.7214193344116 21.324166666666667 -5.9 55.34948312936618 150.0 -5.9 3.1961564482033317e-07 2.188916931153323 0 4005 464 3989.3429551124573 24.103674603174603 -5.9 54.18527517562889 150.0 -5.9 -4.3746340530983853e-07 2.260706097817788 0 5001 577 4999.912457227707 22.4105 -5.9 55.864821145427015 150.0 -5.9 1.992825441326459e-08 2.2871246128133773 0 6000 696 5986.300004005432 32.689412698412696 -5.9 59.9964563376872 150.0 -5.9 1.7967948541907436e-07 2.296894638469008 0 7008 815 7096.754754066467 27.18331746031746 -5.9 58.167852371518755 150.0 -5.9 3.989809599721391e-07 2.300509405591157 0 8000 931 8106.299434661865 24.40085714285714 -5.9 58.024090559571476 150.0 -5.9 1.8623225393274041e-07 2.3018157181673518 0 9000 1048 9242.461366891861 22.966150793650794 -5.9 55.376905888995104 150.0 -5.9 -4.808707413055517e-08 2.3023021890432425 0 10002 1169 10238.713153362274 25.241888888888887 -5.9 54.85749905238189 150.0 -5.9 6.797045823572544e-07 2.302481268402564 0 11006 1289 11238.142277002335 18.66552380952381 -5.9 53.26463658802693 150.0 -5.9 4.60787221556961e-07 2.302547061193912 0 12004 1409 12385.134836912155 18.293333333333333 -5.9 52.35123286505851 150.0 -5.9 3.5959581813798054e-07 2.302571073918868 0 13003 1527 13382.710339069366 24.314190476190475 -5.9 57.79681518810051 150.0 -5.9 7.123611406849152e-07 2.3025799230086963 0 14003 1642 14497.844116210938 19.721595238095237 -5.9 52.090269725980534 150.0 -5.9 2.405257353390858e-07 2.3025831733012208 0 15009 1759 15527.261215925217 16.640428571428572 -5.9 52.40810970538776 150.0 -5.9 2.5949955284029753e-07 2.302584367560976 0 16001 1872 16527.354995012283 15.308873015873015 -5.9 47.55863992776027 150.0 -5.9 4.5101203168739945e-07 2.3025848190372113 0 17000 1987 17524.388580322266 19.563507936507936 -5.9 53.51681486069027 150.0 -5.9 -9.934086021460157e-07 2.302584974261194 0 18009 2102 18508.509554862976 28.22224603174603 -5.9 60.101951575866316 150.0 -5.9 -1.4789098939599878e-07 2.3025850352305675 0 19003 2219 19489.714606046677 20.670055555555557 -5.9 54.129024561430455 150.0 -5.9 -3.138120674793102e-07 2.302585055762449 0 20007 2333 20500.296735048294 21.23859523809524 -5.9 53.43898774359548 150.0 -5.9 -1.041718088281207e-06 2.3025850626617905 0 21002 2443 21490.299568653107 19.988333333333333 -5.9 54.386102192309004 150.0 -5.9 3.315658858616448e-08 2.302585078153528 0 22009 2559 22482.89279651642 17.965174603174603 -5.9 50.04625492499749 150.0 -5.9 -2.0176486169396568e-07 2.3025850746358656 0 23000 2675 23478.354805469513 20.01042857142857 -5.9 52.752841128826816 150.0 -5.9 2.091656010559166e-07 2.3025850721881134 0 24003 2793 24447.62403702736 23.356055555555557 -5.9 56.143708825768556 150.0 -5.9 2.8479301697167965e-07 2.302585075162775 0 25002 2916 25455.910034179688 20.94388888888889 -5.9 54.695452449957614 150.0 -5.9 2.6206007583644357e-07 2.302585073102572 0 26001 3037 26420.782063245773 28.123079365079366 -5.9 59.860082008052714 150.0 -5.9 -1.2096961647991957e-08 2.3025850702852577 0 27001 3150 27392.79161787033 25.020261904761906 -5.9 57.56063031885219 150.0 -5.9 1.7676368269941318e-07 2.3025850708910833 0 28005 3270 28422.325241327286 17.385968253968255 -5.9 50.533298019622244 150.0 -5.9 2.6520755837799027e-07 2.302585067384586 0 29000 3385 29415.868234157562 21.92498412698413 -5.9 53.27320790566971 150.0 -5.9 5.222223810643408e-07 2.302585070271495 0 30003 3501 30515.31321787834 20.653222222222222 -5.9 53.994337925226375 150.0 -5.9 2.556834197320482e-07 2.302585061402968 0 31002 3616 31493.621422290802 22.029746031746033 -5.9 56.759014333474205 150.0 -5.9 -3.76738768276439e-07 2.3025850776329304 0 32004 3732 32491.320425987244 19.627666666666666 -5.9 53.51933538912668 150.0 -5.9 5.61863638213642e-07 2.3025850749386416 0 33004 3848 33512.88587427139 16.888261904761904 -5.9 51.10829806736216 150.0 -5.9 -1.4130920087261352e-07 2.3025850741202207 0 34008 3967 34473.20454478264 15.827666666666666 -5.9 48.6348890313185 150.0 -5.9 3.587256077579446e-07 2.3025850638898056 0 35009 4084 35395.335030794144 27.87406349206349 -5.9 58.03223525217698 150.0 -5.9 2.770934896455839e-07 2.3025850775793035 0 36003 4201 36385.66161322594 29.762761904761906 -5.9 59.66770127852202 150.0 -5.9 -1.3605837356490141e-06 2.3025850744173924 0 37004 4326 37332.86027240753 20.380261904761905 -5.9 53.29296239724883 150.0 -5.9 -1.888996110562725e-07 2.302585072134963 0 38003 4438 38336.06611442566 24.321984126984127 -5.9 54.543635532163044 150.0 -5.9 4.592504979298108e-07 2.302585083386111 0 39000 4564 39391.35005712509 27.084206349206347 -5.9 58.19465279506855 150.0 -5.9 1.8043387102438843e-07 2.3025850779716226 0 40003 4682 40512.56388711929 13.569984126984126 -5.9 47.06888430794522 150.0 -5.9 -7.582019350755482e-07 2.3025850656538807 0 41002 4795 41461.543360471725 20.429079365079364 -5.9 55.250408531181805 150.0 -5.9 -2.948083053555457e-07 2.3025850683870406 0 42006 4914 42439.11411833763 21.411055555555556 -5.9 53.727463840004695 150.0 -5.9 -1.0064515051598567e-07 2.302585082454101 0 43002 5029 43420.86684203148 28.58224603174603 -5.9 57.708612916483574 150.0 -5.9 -2.934947674344872e-07 2.3025850742666782 0 44003 5142 44450.65468072891 27.506785714285712 -5.9 59.0385711983105 150.0 -5.9 1.026939085862694e-07 2.3025850784965565 0 45000 5253 45530.10839176178 20.081190476190475 -5.9 52.833746904880584 150.0 -5.9 4.94464717826299e-07 2.3025850802918013 0 46000 5377 46568.64842247963 17.039817460317458 -5.9 51.51962657386226 150.0 -5.9 4.6958347168308055e-07 2.302585078898646 0 47001 5496 47585.09243297577 18.953666666666667 -5.9 53.78945296401842 150.0 -5.9 -1.7994397944596077e-07 2.3025850853009517 0 48008 5610 48586.09735202789 18.556555555555555 -5.9 53.07313769806573 150.0 -5.9 2.1672644222796102e-08 2.302585084981513 0 49006 5725 49578.29277634621 28.05772222222222 -5.9 59.8585618441298 150.0 -5.9 3.1875332480233847e-07 2.3025850819343936 0 50000 5842 50675.663626909256 26.728150793650794 -5.9 57.33805608130482 150.0 -5.9 4.102789388315405e-07 2.3025850850174123 0
Are these mean, median, stdev, max, min value of accumulated rewards?
Yes, they are based on undiscounted accumulated rewards of evaluation episodes.
What's does the 'mean median stdev max min' columns in scores.txt mean? Are these mean, median, stdev, max, min value of accumulated rewards? The model I use is ACER, and the scores.txt is as follows. The result seems strange. Is it not convergent?
scores.txt :