ZhengyaoJiang / PGPortfolio

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).
GNU General Public License v3.0
1.73k stars 748 forks source link

Question regarding possible error in calculating __pv_vector in NNAgent class. #111

Open BruceDai003 opened 5 years ago

BruceDai003 commented 5 years ago

Thanks for your paper and project, I've been digging into the code for the past several days, and I felt that there might be an error in the implementation of calculating __pv_vectorin the NNAgentclass:

class NNAgent:
...
self.__pv_vector = tf.reduce_sum(self.__net.output * self.__future_price, reduction_indices=[1]) *\
                           (tf.concat([tf.ones(1), self.__pure_pc()], axis=0))

And here is the definition of __pure_pc funtion:

    # consumption vector (on each periods)
    def __pure_pc(self):
        c = self.__commission_ratio
        w_t = self.__future_omega[:self.__net.input_num-1]  # rebalanced
        w_t1 = self.__net.output[1:self.__net.input_num]
        mu = 1 - tf.reduce_sum(tf.abs(w_t1[:, 1:]-w_t[:, 1:]), axis=1)*c

        return mu

From my understanding, function __pure_pc() calculates the shrinkage factor due to re-balancing, this is a little bit different from the paper, where the mu was not subtracted from 1. Let's get back to the calculation of self.__pv_vector, the term before * is the net value of the portfolio at the end of each time period(Let's call this term NV). You want to calculate the net value of the portfolio at the beginning of the next time period if I interpreted it correctly. But (tf.concat([tf.ones(1), self.__pure_pc()], axis=0)) doesn't fit in your purpose here. Because NV[0] is the net value of the portfolio at the end of the first time period in this batch. It should multiply by the first shrinkage factor in self.__pure_pc() to get the net value of the portfolio at the beginning of the second time period relative to at the beginning of the first time period , instead of multiplied by 1 here.

Thus, my suggested correction would look like this:

self.__pv_vector = tf.concat([tf.ones(1), 
                                      tf.reduce_sum(self.__net.output * self.__future_price, axis=1)[:-1] *
                                      self.__pure_pc()], axis=0)
cgebe commented 5 years ago

I raised concerns regarding the computation of mu as well, see https://github.com/ZhengyaoJiang/PGPortfolio/issues/99#issuecomment-441389056

Especially, if you use a batch_size of 1, the error gets apparent.