ZhengyaoJiang / PGPortfolio

PGPortfolio: Policy Gradient Portfolio, the source code of "A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem"(https://arxiv.org/pdf/1706.10059.pdf).
GNU General Public License v3.0
1.73k stars 748 forks source link

Question in getting the previous weights #113

Open BruceDai003 opened 5 years ago

BruceDai003 commented 5 years ago

Thanks for your great work, it's very interesting. As I walk through your code, I found that in the file datamatrices.py, there is a __pack_samples() method under the DataMatrices, which is confusing to me.

First, let me paste it all here for reference:

    def __pack_samples(self, indexs):
        indexs = np.array(indexs)
        last_w = self.__PVM.values[indexs-1, :]

        def setw(w):
            self.__PVM.iloc[indexs, :] = w
        M = [self.get_submatrix(index) for index in indexs]
        M = np.array(M)
        X = M[:, :, :, :-1]
        y = M[:, :, :, -1] / M[:, 0, None, :, -2]
        return {"X": X, "y": y, "last_w": last_w, "setw": setw}

    # volume in y is the volume in next access period
    def get_submatrix(self, ind):
        return self.__global_data.values[:, :, ind:ind+self._window_size+1]

In the initialization of the test_set, we passed in the test_indices, which are np.array([32281, ..., 35056]) by default, . If I understand correctly, shape of X would be 2776, 3, 11, 31, y would be 2776, 3, 11. y is the next day's 'close', 'high', 'low' price array, normalized relative to the last day's close price in X for each sample. Now the confusing part is in the weights. Let me use t represent for the time index here. For t = 32281, we get the X from t = 32281 up to 32311, that's 31 days, and for y, t = 32312. So, for this sample, you are using forward looking samples. What I mean is that you are actually sitting at the time of t = 32311, and look backwards for 31 days including t = 32311. And try to predict the weights for t = 32312. This is your intention. Now you should use weights of t = 32311 as the other input. And the weights for the action is at t = 32312. However, instead you used weights at t = 32280 as an input. That's the weight 32 days ago.

Thus, my suggested correction would look like this:

    def __pack_samples(self, indexs):
        indexs = np.array(indexs)
        last_w = self.__PVM.values[indexs+self._window_size-1, :]

        def setw(w):
            self.__PVM.iloc[indexs+self._window_size, :] = w
        M = [self.get_submatrix(index) for index in indexs]
        M = np.array(M)
        X = M[:, :, :, :-1]
        y = M[:, :, :, -1] / M[:, 0, None, :, -2]
        return {"X": X, "y": y, "last_w": last_w, "setw": setw}

    # volume in y is the volume in next access period
    def get_submatrix(self, ind):
        return self.__global_data.values[:, :, ind:ind+self._window_size+1]

I might be wrong here. Please help me understand this.