Open FutureGoingOn opened 5 years ago
Yeah he didn't explain that well- it took me a while to notice the word "contemporary" as referring to data at time T. The paper specifies that the inputs are the values of the target series from 1 to T-1, along with all the values from all exogenous series from 1 to T. He's trying to predict the target series at T. He's worried that the network will learn that it can ignore all the older data, focusing only the data from the very last time point, and then simply return the sum of the prices at time T. You're basically giving it all the information it needs to cheat if you include data at T. I'm not sure why the paper includes it either. "T" means "tomorrow". Today is T-1. An algorithm to predict tomorrow's NASDAQ shouldn't be requiring tomorrow's individual prices as input. I can do that in Excel.
So, what I'm getting from this discussion is the paper appears to be including the current time-series and is thus doing regression. However, is Chandler's code also including the current time step? Have I replicated this error?
It's been a while since I've used this code and I don't like how I did the data pre-processing here; it's quite hard to read. Consequently, if you don't know the answer to my question, that's fine, I can figure it out later.
As I read it, he's saying he excluded the current time step. I'm still squinting at this code myself. This thing is really complicated- it has seven layers.
BTW the code throws a device mismatch error on a GPU. You need to import constants.device and call to(device) on the arrays returned by torch.zeroes() in modules.py. Then it works.
They also deprecated torch.autograd.Variable in Pytorch 1.0 so you don't need to call it anymore.
I'm trying to dig into this dataloader (really sophisticated btw). What i've noticed: on a dummy "linear" data it skips y_history (or y_target, it depends on the point of view) at time T. i.e.: feats are 1,2..9 (for T=10), y_hist is 11,12..19, and y_target generated from prep_train_data function is [21]. For me it either should be 1,2..10 and 11,12...20 along with [21] as y_target, or [20] as target. But can't get idea of limiting lenght of time window from 10 to 9 timesteps during data processing.
@jtiscione Yeah, it is meaningless to add the current exogenous sequence for prediction, which is more like an auxiliary measurement method. In addition, I suggest that you refer to the later paper (GeoMAN: Multi-level Attention Networks for Geo-sensory Time Series Prediction). Their ideas are consistent and the code is open source. Another problem is that I can't call all the GPUs to run this program. The utilization rate is about 20% and the memory utilization rate is 9%. I see that you mentioned the changes in GPU operation. Could you give me more detailed guidance? Look forward to your reply.
@Seanny123 Yes, you didn't use the current external sequence, and Chandler also does not do this. Your approach is not the same as that mentioned in the original paper, but it is more meaningful, but the result is worse.
In addition, this way of data proprocessing together is not realistic, because we will not know the future series in advance. But if the training set and the test set are operated separately, the result will not fluctuate much, so this operation is acceptable. The original author did the same thing.
@notonlyvandalzzz For this proble, please pay attention to all T-1 in the code, especially when raw data is entered.
@lyq1471 yes, i see this. Replacing all T-1 with T gives gapless data, but both Chandler and @Seanny123 made code with T-1
@notonlyvandalzzz Maybe you should believe the truth, not anyone. Besides, are you running on GPU? Can all GPU resources be called? I think maybe the Pytorch version (PyTorch0.3.0) or the delay in sending and receiving data causes me to not use the GPU.
Yeah he didn't explain that well- it took me a while to notice the word "contemporary" as referring to data at time T. The paper specifies that the inputs are the values of the target series from 1 to T-1, along with all the values from all exogenous series from 1 to T. He's trying to predict the target series at T. He's worried that the network will learn that it can ignore all the older data, focusing only the data from the very last time point, and then simply return the sum of the prices at time T. You're basically giving it all the information it needs to cheat if you include data at T. I'm not sure why the paper includes it either. "T" means "tomorrow". Today is T-1. An algorithm to predict tomorrow's NASDAQ shouldn't be requiring tomorrow's individual prices as input. I can do that in Excel.
@jtiscione original DA-RNN paper from arxiv.org says about y(1...T-1) and x(1...T) mapped to y(T)
Quoted from the Abstract of DA-RNN paper:
The Nonlinear autoregressive exogenous (NARX) model, which predicts the current value of a time series based upon its previous values as well as the current and past values of multiple driving (exogenous) series, has been studied for decades.
Their scenario setup are based on NARX, By googling "NARX", two types of problem definition were found, differed in U(t) (exogenous driving in time t) is included or not. https://en.wikipedia.org/wiki/Nonlinear_autoregressive_exogenous_model https://www.mathworks.com/help/deeplearning/ug/design-time-series-narx-feedback-neural-networks.html
In the paper it gives two experiments, SML 2010 and NASDAQ 100: In SML 2020, by checking the attributes definition: https://archive.ics.uci.edu/ml/datasets/SML2010 It makes sense to include U(t) in prediction of Y(t), since we don't know the mysterious relation between Temp. and the exogenous driving such as Wind speed, CO2 ppm, Date, etc.
But in NASDAQ100, the experiment setup is somewhat ambiguous. Since NASDAQ100 index can be directly computed using market-cap weighted method in real time, given its 100 composites prices. So if U covers all 100 composites, U(t) can produce Y(t) at 100% accuracy, it becomes a failed-setup prediction problem. But in the paper:
In the NASDAQ 100 Stock dataset1, we collected the stock prices of 81 major corporations under NASDAQ 100, which are used as the driving time series. The index value of the NASDAQ 100 is used as the target series.
Only 81 composites were used in U, so the inclusion of U(t) in prediction of Y(t) still can be said a meaningful problem setup, though it contributes most part of the learning effort.
I am confused about Chandler's change in the code. He presented in his blog, i.e. "Unlike the experiment presented in the paper, which uses the contemporary values of exogenous factors to predict the target variable, I exclude them.". What the mean is that instead of inputting synchronous external factors when predicting the target series, he only used the past values of all external series to predict the target time series? Instead, current values of external factor are not used for prediction in his code?