bgithub1 / da_rnn

RNN based on Chandler Zuo's implementation of the paper: A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction
18 stars 1 forks source link

why use companies' stock price to predict NASDAQ-100 Index? #2

Open gao27024037 opened 4 years ago

gao27024037 commented 4 years ago

hello, your code in Class da_rnn init(), the code of getting data is below, why do you use companies' stock price to predict NASDAQ-100 Index? especially ticker='NDX' in function‘s brackets and
self.X = df_dat.loc[:, self.x_columns].as_matrix() self.y = np.array(df_dat[ticker])

    def __init__(self, df_dat, logger, encoder_hidden_size = 64, decoder_hidden_size = 64, T = 10,
                 learning_rate = 0.01, batch_size = 128, parallel = True, debug = False,ticker='NDX'):
        self.df_dat = df_dat
        self.T = T
        self.logger = logger
        self.logger.info("Shape of data: %s.\nMissing in data: %s." %(str(df_dat.shape), str(df_dat.isnull().sum().sum())))
        self.x_columns = [x for x in df_dat.columns.tolist() if x != ticker]
        self.X = df_dat.loc[:, self.x_columns].as_matrix()
        self.y = np.array(df_dat[ticker])
        self.batch_size = batch_size

NDX should be calculated by these stock prices, isn’t it? why u have to learn the calculation formula by RNN? The DA-RNN paper gives a time series predicting model, right? But where is your time series predicting? I am confusion.

That's what I found when I read the code repeatedly, If I got wrong or missed something, please tell me. Thank you.

bgithub1 commented 4 years ago

Hi there,

You asked, "NDX should be calculated by these stock prices, isn’t it? " The answer is:

  1. Yes, on any day, the settlement of NDX is mathematically equal to a weighted average of the components of NDX (what you refer to as "these prices") FOR THE SAME DAY.
  2. But the next settlement of NDX is predicted by the previous or "lagged" prices of the components of NDX.
  3. da_rnn attempts to predict the next settlement price using the previous prices of its components. To predict the settlement for day 11/21/2019, it will use component settlements up to 11/20/2019 - but NOT INCLUDING 11/21/2019.

I hope that answers your question. If not, send me a response.

-Bill

gao27024037 commented 4 years ago

Thank you for your answer. But I still have a question.

I can understand that the next NDX is predicted by the previous prices of the components of NDX by storing the information in encoder-decoder. But in the CSV file of stock, on any row, the price of NDX and the price of the components represent the same time or same day, which means the next prices of the components of NDX are contained in the input X and the code did the same. So I cannot understand why the input X with components at time T can predict the output Y with NDX at time T.

bgithub1 commented 4 years ago

Hi again,

I'll need a day or so to look through the da_rnn code vs the equations for the decoder (in the original paper) so that I can show you the exact code lines where y values at time T are paired with encoded sequences of X values up to T-1.

Thx, -Bill

gao27024037 commented 4 years ago

Thank u. I will be appreciated for u taking the time from your busy schedule to help me.

bgithub1 commented 4 years ago

I believe the confusion of the da_rnn model results from this specific line (line 72 in the decoder cell of the jupyter notebook da_rnn_from_csv.ipynb.

y_tilde = self.w(torch.cat((context, y_history[:, t].unsqueeze(1)), dim = 1)) # batch_size, 1

This line implements equation 15 in the original paper (https://arxiv.org/pdf/1704.02971.pdf). I believe the authors used the previous history of y values to enhance the decoder's ability to select the most relevant parts of the encoder input. It's like the authors were regressing the future value of NDX using both the previous values of NDX's components and the previous values of NDX itself.

After equation 15, equation 16 runs an LSTM using the previous hidden state of the decoder LSTM and this new input called y_tilde. This LSTM is run for each time step t in order to build up the full hidden variable.

                self.lstm_layer.flatten_parameters()
                _, lstm_output = self.lstm_layer(y_tilde.unsqueeze(0), (hidden, cell))
                # ********************** Eqn. 16: LSTM **********************
                self.lstm_layer.flatten_parameters()
                _, lstm_output = self.lstm_layer(y_tilde.unsqueeze(0), (hidden, cell))
                # ********************** Eqn. 16: LSTM **********************

                # update values
                hidden = lstm_output[0] # 1 * batch_size * decoder_hidden_size
                cell = lstm_output[1] # 1 * batch_size * decoder_hidden_size

hidden will finally be used in equation 22, to produce a prediction.

        y_pred = self.fc_final(torch.cat((hidden[0], context), dim = 1))
        return y_pred

In da_rnn's method train_interation, the next (and maybe most important) lines of code show how only future values of y are used to actually perform the desired regression. In those lines, the future value of y (the variable y_true) and the decoder output y_pred are used as inputs to the loss function, which is followed by back-propagation:

        y_pred = self.decoder(input_encoded, Variable(torch.from_numpy(y_history).type(torch.FloatTensor)))
        y_true = Variable(torch.from_numpy(y_target).type(torch.FloatTensor)).reshape(y_target.shape[0],1)

        loss = self.loss_func(y_pred, y_true)
        loss.backward()

I believe that you will see that the y_true get's constructed from y_target, which contains values of y from time T, not time T-1.

Maybe you can review this code and see if my explanation makes sense to you. If not, or if you have any more questions, feel free to write back.

-Bill

gao27024037 commented 4 years ago

Oh, I got your point, I see your code y_history is 1 to T-1 and y_target is T. I think my problem is solved. Thank you!

But I read the paper again too and find the problem may be brought by the original paper, the original paper's NARX model function is below: function There are X_T and y_tlide_T at the same time definitely (no matter what code it is).

So I read the NARX model again and asked my supervisor. He told me the model is commonly used in Control Engineering and the y_T and X_T are not conflicted because y_T cannot be calculated by X_T. The sentence in NARX:

which relates to the fact that knowledge of other terms will not enable the current value of the time series to be predicted exactly.

I think it's really useful. Considering the difference of features of the SML 2010 dataset (the temperature forecasting) between features of the NASDAQ 100 Stock dataset. In my humble opinion, the original paper's model can apply in temperature data rightly, but it's not properly to apply in stock data. maybe it is not our fault.

No matter what, Thank you very much.

bgithub1 commented 4 years ago

You are welcome.

On another note, I have implemented several LSTM based neural networks that essentially perform regression on financial data. The neural networks always return results that basically prove that financial time series behave as Random Walks/Martingales.

When you zoom in on a graph of the predictions vs actuals in the test sets of these nn's, you will see that the change in the predictions always predict the previous actual change.

Any organization that has made money using these neural network technologies must also be using other data non-market data (twitter feeds, etc), as well as using more granular time series (minute data, or even market depth bid/ask data).

Good luck with your studies, -Bill

gao27024037 commented 4 years ago

Thank you for helping me, your codes are worthy to learn!