Seanny123 / da-rnn

Dual-Stage Attention-Based Recurrent Neural Net for Time Series Prediction
333 stars 118 forks source link

Minor improvements #9

Open goshaQ opened 5 years ago

goshaQ commented 5 years ago

What was fixed?

This PR addresses the following issues: #4 #8

How?

Now the model takes into consideration the current values of driving series. Also resolved the issue with running on GPU.

Comment

Btw, it is strange that in the original implementation the current values of exogenous (driving) series are not considered, because they are not dependent on the target values and usually are known in advance. But as for me, it depends on the problem formulation.

goshaQ commented 5 years ago

The current version of the model throws a TypeError if you try to run it with Pytorch 1.x. The problem is that the model don't work on GPU because it allocates regular FloatTensor instead of cuda.FloatTensor. In this PR I fixed it. The latter is not an optimized version of the former, the difference between them that one of them occupies CPU memory while the other GPU memory (roughly speaking). So there should not be any improvement in terms of memory usage and speed.

Increased usage of GPU memory can be explained by increased batch size. Now it takes into consideration the current values of driving series. The author of the reference implementation though that it might be meaningless to make predictions based on this values and excluded them, however this is wrong.

In the original paper, the authors tried to predict the NASDAQ100 index by using stock prices of only 81 company. Yes, this index is a simple linear combination of stock prices of a hundred of the largest companies, but they don't use stock prices of all contributing companies, so it isn't a "cheating".

FutureGoingOn commented 5 years ago

@goshaQ Thanks for your reply. But I still can't improve the speed on Pytorch 0.4.0, and the memory of GPU is not well utilized. Is this a code problem? Or is it a problem from the delay of data sending and receiving? At least the memory upper limit of the GPU should be reached, and the GPU utilization is also only about 30%. The speed of the model was unacceptable when my data volume increased a lot.

Btw, I totally agree with you about the NARX model and DARNN. Different forecasting methods can be applied to different scenarios, and the forecasting accuracy will be higher in this method.