ZhengyaoJiang / PGPortfolio

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).
GNU General Public License v3.0
1.74k stars 750 forks source link

Learning procedure #24

Closed lytkarinskiy closed 6 years ago

lytkarinskiy commented 6 years ago

Hello again!

May I ask here for more details about learning procedure, because I'm not really in shape to understand all the code, may be with your guides here I'll go through it again with more success.

  1. During training phase how many times CNN learns on the same batch? Do you use epochs to learn or CNN passes through the data only once?
  2. During CV and Test phases rolling learning is used. On what data do CNN weights get updated? After all orders have been completed in current period we add price history into local DB. Do we select N periods before current period into learning batch? Or we update weights only using last price window?

Sorry if it's newbie questions, I just want to understand how this magic works.

dexhunter commented 6 years ago

During training phase how many times CNN learns on the same batch? Do you use epochs to learn or CNN passes through the data only once?

The mini-batch is selected using geometric distribution. Something looks like below, depends on the steps. screenshot_2017-12-17_19-56-47

There is no epoch in current framework.

During CV and Test phases rolling learning is used. On what data do CNN weights get updated? After all orders have been completed in current period we add price history into local DB. Do we select N periods before current period into learning batch? Or we update weights only using last price window?

From the code we know that the data of rolling train depends on the setting of rolling train steps. What it does is adding batches to the training set (still geometric distribution) and the network will be trained again. Then the weights are updated according to the network.

lytkarinskiy commented 6 years ago

Thanks for the fast reply! So does it mean that during training having parameter "steps" we do as much as steps-time (80000 for example like in default config) selections of batches from the train history which contains "batch_size" (109 for example like in default config) selected with geometric distribution mini-batch windows of data of size ("window_size", "coin_number", "feature_number")? So during training each step we have "batch_size" windows of data from different points of training set and most of them are "new"?

And during CV and Test phases it's almost the same, but we do training not "steps" times but "rolling_training_steps" times instead using "batch_size" windows of data from different points of whole history including new points? And it follows that according to geometric distribution it's unlikely that we will select data from 2014 year if it's 2017 now but it's more likely we will select more 2017 data.

Is it correct?

Again, thanks a lot for help.

dexhunter commented 6 years ago

First let me clarify several things. window_size is the number of periods insides each mini-batch which is sequential. batch_size is the number of batches put into network at once (comes from batch training, which is used to speed up training). step is the number of selection of each mini-batch. rolling_training_step is the primarily number of mini-batches to add to the training set.

I am going to skip your first part of questions since it is probably answered at the second part.

And during CV and Test phases it's almost the same, but we do training not "steps" times but "rolling_training_steps" times instead using "batch_size" windows of data from different points of whole history including new points?

In the rolling 'train', it's more like evaluating mini-batch for rolling_training_step times. Unlike training, where multiple batches are put at once, rolling train evaluate one batch at a time. And yes the data selected using geometric distribution will be added to the training set during rolling train.

And it follows that according to geometric distribution it's unlikely that we will select data from 2014 year if it's 2017 now but it's more likely we will select more 2017 data.

Yes, the more recent data will play a more import role in network.

Besides, I think if you are able to read the code, please read it. IMHO, the code is more clear than my explanation.

lytkarinskiy commented 6 years ago

Dear @DexHunter your explanations are very good! I really don't want to waste you time but I have last question, I hope ;)

I've tried to code geometrical distribution using snippets from your code and it's properties are clear - the newer the data the more likely it will be selected.

... training, where multiple batches are put at once

During "train" we select batch of batch_size mini-batches using geometrical distribution and train network using this batch, i.e. updating weights of NN. We repeat this procedure for steps times, it means we select batch_size count of mini-batches for steps times and train network for steps times.

...rolling train evaluate one batch at a time

During "rolling train" we select batch which is just one mini-batch using geometrical distribution and train network using this single mini-batch, i.e. updating weights of NN. We repeat this procedure for rolling_training_steps, it means we select single mini-batch for rolling_training_steps times and train NN for rolling_training_steps times.

Is it correct?

P.S. I really try my best to read the code, it's very clean but I think first it's important to understand high-level explanation :)

Thanks!!!!!!! Best regards, Andrey

dexhunter commented 6 years ago

Yep, that's basically how I understand it.

lytkarinskiy commented 6 years ago

Thanks again!