ZhengyaoJiang / PGPortfolio

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).
GNU General Public License v3.0
1.74k stars 750 forks source link

portfolio value increases then decreases after some steps #69

Open AhmMontasser opened 6 years ago

AhmMontasser commented 6 years ago

Hello,

While training, i some times face a case where the portfolio value increases to a limit then gradually decreases till the end of the training steps here is an example: step 0

the portfolio value on test set is 4.095204 log_mean is 0.0005078592 loss_value is -0.000508 log mean without commission fee is 0.000529

==============================

average time for data accessing is 0.0012977843284606933 average time for training is 0.009977315187454223

step 1000

the portfolio value on test set is 4.525525 log_mean is 0.0005438528 loss_value is -0.000544 log mean without commission fee is 0.000585

==============================

average time for data accessing is 0.00144400954246521 average time for training is 0.011899509906768798

step 2000

the portfolio value on test set is 4.716191 log_mean is 0.000558717 loss_value is -0.000559 log mean without commission fee is 0.000623

==============================

average time for data accessing is 0.0014981653690338136 average time for training is 0.012066781520843506

step 3000

the portfolio value on test set is 5.136598 log_mean is 0.00058947725 loss_value is -0.000589 log mean without commission fee is 0.000768

==============================

average time for data accessing is 0.0013505065441131593 average time for training is 0.010587116718292237

step 4000

the portfolio value on test set is 6.200308 log_mean is 0.00065727555 loss_value is -0.000657 log mean without commission fee is 0.001154

==============================

average time for data accessing is 0.0014070169925689698 average time for training is 0.010920660257339478

step 5000

the portfolio value on test set is 5.680704 log_mean is 0.00062574673 loss_value is -0.000626 log mean without commission fee is 0.001350

==============================

average time for data accessing is 0.0013507211208343506 average time for training is 0.010532096147537232

step 6000

the portfolio value on test set is 5.238808 log_mean is 0.0005965769 loss_value is -0.000596 log mean without commission fee is 0.001481

==============================

average time for data accessing is 0.0013532636165618896 average time for training is 0.010635793209075928

step 7000

the portfolio value on test set is 4.790911 log_mean is 0.00056438043 loss_value is -0.000564 log mean without commission fee is 0.001594

==============================

average time for data accessing is 0.001356858253479004 average time for training is 0.010408684253692627

step 8000

the portfolio value on test set is 4.475946 log_mean is 0.000539884 loss_value is -0.000540 log mean without commission fee is 0.001697

does anybody know what could be the possible reason for this ?

astanziola commented 6 years ago

That's probably overfitting, those values are calculated on the test set

AhmMontasser commented 6 years ago

@rpfeynman how is it overfitting ? doing well on test set then going worse and worse on the same test set , is this overfitting ?

istvanmo commented 6 years ago

A good blog post about "overfitting" in RL: https://medium.com/mlreview/making-sense-of-the-bias-variance-trade-off-in-deep-reinforcement-learning-79cf1e83d565