Closed Tschigger closed 8 years ago
++
I'm a finance student thinking about writing a thesis about machine learning analysis for a specific type of derivative. I was able to set up the dl4j environment (great tutorials by the way), read a lot about recurrent neural networks, both on deeplearning4j and other sites, and played around a bit with the GravesLSTM example provided.
However, setting up just a very basic time-series example seems like an unfeasible step for me. I know developers here are busy, but providing a short example would make everyone able to get a quick hands-on and maybe even make them actively improve the code and/or provide further examples based on the one initially created.
Right now, the entry barrier for everyone trying to play around with time series in dl4j is just too high for "casual" programmers and I think a quick example (doesn't need to be complex, can be super-basic) would definitely change this.
No arguments there. We certainly need to make things easier here.
How about this: do some research and find me some data sets for this, maybe in the range of 10k to 1M time steps total (though smaller or larger might work too). Multi-variate input/output is fine. Data sets that have human-interpretable output would be better for an example.
If you can find something suitable, I'll try to put together a basic example in the next few days.
This might be useful: http://mldata.org/ http://www.kdnuggets.com/datasets/index.html http://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free http://www.datasciencecentral.com/profiles/blogs/20-free-big-data-sources-everyone-should-check-out http://www.datasciencecentral.com/profiles/blogs/great-sensor-datasets-to-prepare-your-next-career-move-in-iot-int?xg_source=activity http://archive.ics.uci.edu/ml/datasets.html
This sounds great. Won't have time the next two days, but I will get you something by the end of the week. The kind of data I am aiming for at my thesis will be pretty boring for the average user, but I have something else in mind which should be more interesting for people in here:
I could provide time series for bitcoin prices of the last 2 years with 1 minute intervals. Should be 1M time steps in total. I could also add the traded volume in that 1 minute, so we got multi-variate input. Wouldn't be surprised if you get good results there, since bitcoin prices are pretty much only driven by supply/demand and trends. Other financial instruments are, in general, strongly influenced by real-world news which makes training on past data much harder IMHO.
Hey,
I found time today to create the dataset, it reaches from July 2013 until December 2015. I increased the length of the time steps to 10 minutes minimum, because at shorter timesteps the price was largely dependent on the last trade (last trade hitting an ask = high price, last trade hitting a bid = low price). BTW, the traded volume is measured in bitcoins, not USD. http://www.filedropper.com/btcusd
I can also try to find some non-financial time-series until the end of the week if you are not happy with this one. Don't put too much work into the results, the example can be super-basic, just to give people something to toy with will be wonderful already.
thanks for the data
I do have a few questions/concerns about this as an example though:
A) Whatever you want. I think price 2-3 steps ahead seems the most intuitive
B)
1) (Price normalization) Okay, wasn't completely sure if you have methods which automatically normalize the input. Will provide the log-returns. Should price movement itself be normalized too, so that starting price equals end price and expected return = 0?
2) (Volume normalization) Should have checked that, my bad. Intuitively though, this might be hard to do. Volume in July 2013 was very low, in November 2013 (during the big bubble) it was high - even for today's standards. Will think about a way to do this.
3) (Inputs) You are absolutely right, good idea. Will include day of the week and hour. Also, the number of ticks in the interval (number of different price levels) and maybe something else if I get creative until I come home.
Ok.
1) Normalized prizes. We have log-returns now (last column). This is what would be nice to predict.
2) Volume normalized as far as thats possible. Problem is that at the beginning, there are many time-steps with zero volume and I can't really change that. Did basic linear normalization to get rid of the 'trend'.
3) Added A) Day B) Hour C) Number of Ticks (number of bids/asks being hit) D) Variance of the Price in that time step
If you have any questions, feel free to ask. Readme explains the structure. BTW, at end of August 2015 there are many blank rows. This is due to the server suspending trading. No clue what to do there. Problems start at Unix Time 1440394827. August 17th to August 24th, the market had problems in general. Maybe we should split the data here and define our training data as the data before 17th of August, and testing data as the data past August 24th?
August 15th - unix timestamp 1439682027 btc_10min line: 119.24418331969501,0,1,997.5949627008066,.000365909712118941,-.0026350476380051138 btc_60min line: 1626.9757215653303,0,1,5407.618989005082,.0022965163818520463,-.00932505282362648
August 25th - unix timestamp 1440582027 btc_10min line: 69.37682744664761,3,11,73.95588956557472,.00030399310005783063,-.003342472528277482 btc_60min line: 568.5045582433623,3,11,1181.636044151092,.0014780764785619163,-.007427372978020849
OK, thanks. I think I can work with that. I should be able to throw something basic together in the next few days. I'll keep you posted as to my progress.
If you don't get statistical significant results - doesn't matter. It's just about a very basic example, how it's done. To get the whole thing rollin'.
Probably won't be able to get to this for a few more days. Had something important come up that takes priority. I'll be using some new data loading stuff in 0.4-rc3.8 (which is as yet unreleased) so you wouldn't be able to use any example until the next dl4j release anyway.
No stress. Have exams coming up in January so I don't have time anyway. Thanks for the effort so far.
So I haven't looked at this until now. I've had higher priorities, sorry. Link above to the data is dead, it seems.
If you still want an example an example like this: could you provide the data again? Or, alternatively, take a look at this documentation that went up a little while ago: http://deeplearning4j.org/usingrnns.html That page should have much of what you need to write your own version. 0.4-rc3.8 is out now, which has the data loading features needed for this. (Note I was thinking of splitting up the data, possibly into weekly or two-weekly blocks. If you do that, loading data as per the linked page should be relatively straightforward)
Hey,
unfortunately I have final exams soon, starting at the end of January, so I don't have much time at the moment. I uploaded the data again for you: http://www.filedropper.com/btcusd (would be best if split training/testing at the dates specified in previous post).
With splitting the data up in weekly blocks, you mean pre-preparation into multiple files?
Right, pre-preparation into multiple files. That would probably be easiest, given the current data import functionality we have. Anyway, I've downloaded a local copy of that file. I'll try to get this up in the next couple of weeks.
Pre-prepared data into weekly blocks. Each file = one week. Each row = one time step. The output we want to forecast (log_return) is in the last column. Already split into training and testing data.
Here you go (download and save before link deactivates).
+1
(link is deaktivated)
For RNN data, is it a case that each file would contain examples (time steps) of a period of time in each one. For example file 1: 1 hour time steps for 24 hours, file 2: 1 hour time steps for the next 24 hours. Or would each file contain 1 hour time steps on a sliding window ? When I train on a sliding window set of data sources, I see huge spikes in error during training, as if the NN forgets what it has learnt
I have similar request as Tschigger's
Basically, I'd like to use RNN/LSTM to predict time-series multiple variable input and output. The variables are mostly real number, some are categorical. It will be nice if the new example shows how to handle such variables as well as time stamps.
wow Ive been trying a very similar project to @tom2good 's. my data, including input and output are all numbers. I'm trying to predict a couple of variable inputs in them. I understood how I build model so far but, don't know how I reverse things to test variable inputs after training. It will be very helpful and thankful if somebody codes a kind of this time-series based example. Help me plz @AlexDBlack
Hi! I'm applying dl4j to a timeseries regression problem with 2 output variables. I reviewed both the regression and RNN pages and code examples and cobbled together a working example. Training and prediction work without error, but I can't really tell if its working "right" because the regression output isn't really great. Before I charge ahead tuning the network and adding more input features I'd like to have some confirmation that I've set the network up correctly for the problem. Can someone review my gist and answer my questions regarding RNN + regression on 2 output nodes? Thanks in advance.
Here is the gist of the code along with feature and label files for training and test phases: https://gist.github.com/ddrummond/5e05e54f6d79c900c8491b4ab8c1b34f
Problem Description:
Time series forecast for a single securities minor reversal points.
A "minor reversal point" is defined as either a period (day) with a high price greater than both the previous and next high prices, or a period with a low value lower than both the previous and next low prices.
This is a time series regression problem with 8 input features and 2 regression output variables
Input Features:
Regression Targets:
periods (days) until next reversal point, descr: Standardized, z = (x - Mean) / SD
NOTE: Regarding regression target "periods (days)" we'll have to adjust the standardized output of "periods (days)" to the problem domain scale when using the prediction value (i.e. adjustedPeriods = prediction * SD + Mean)
My Questions: After reading the tutorials on RNN (http://deeplearning4j.org/usingrnns.html) and Regression (http://deeplearning4j.org/linear-regression) and reviewing the example code I have the following questions regarding timeseries regression with multiple outputs:
for(int i=0; i < testSetLength; i++ ) {
predictionReturns[i] = prediction.getDouble(0, 0, i);
standardPredictionPeriods[i] = prediction.getDouble(0, 1, i);
}
f) Do you have an example of using a minibatchSize > 1 in a timeseries context? The description of Minibatch Size on this page (http://deeplearning4j.org/troubleshootingneuralnets) makes it sound like minibatch is purely a parallel processing consideration, however when evaluating time series it seems like minibatch must equal the number of timeseries that you are trying to learn from. If that is true, then it seems to imply that IF I want to train my network on multiple examples (aka multiple stocks) over the same time span, then I'd have to interleave multiple stock prices in the same file. For example, if I want to learn from the time series of 10 stocks and use a minibatchSize=10, I would have to write 10 price lines for t=0 for (1 for each stock), then for t=1 write to lines 11-20, etc. Is this right? Is there a way to avoid merging all of the price files together? g) Is it appropriate to alternate calls to MultiLayerNetwork.fit() and MultiLayerNetwork.rnnTimeStep() if I want to update the net weights with new information or is this implied as part of rnnTimeStep()?
Note that for time series prediction, it may be possible to get predictions for multiple time steps "for free", by generating multiple stocastic predictions of the future by randomly sampling the next time step and iterating, a sort of Monte Carlo simulation. This is essentially what is done in a superficially very different RNN example in DL4J (where samples are generated one character at a time, and future characters are sampled conditional on the previous ones.
To do this properly for a real-valued time series, it is necessary to model the probability distribution of each time step, not merely its mean. Predicting the mean and the variance allows a useful (if not precise) approximation of the next time step as a normal distribution, which can be used for Monte Carlo sampling as I mentioned.
Elroch, yes, I saw that. I'll take another look to see if I can apply it here. In the meanwhile, I'm mostly trying to get some validation that I understand the basics correctly of how the RNN uses input and output, and if I'm handling the 2 variable regression correctly.
Hi all,
Is there any improvement on this front? has anyone uploaded a basic example on implementing RNN on time series data with a moving window?
Hi,
It would appear to me that there is a bug. https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/multilayer/MultiLayerNetwork.java#L2699
The input can not be 2d (rank 2) and have rank 3 ?
If this is the case a simple work around is to take the returned value and call
out.tensorAlongDimension(0, 1, 0);
if (inputIs2d && input.rank() == 3 && layers[layers.length - 1].type() == Type.RECURRENT) {
//Return 2d output with shape [miniBatchSize,nOut]
// instead of 3d output with shape [miniBatchSize,nOut,1]
return input.tensorAlongDimension(0, 1, 0);
}
@coreyauger If you think there is a bug in the DeepLearning4J code can you please create an issue in the deeplearning4j repository? Please provide a small sample to reproduce the bug. This repository is for the deeplearning4J examples. And the issue in this thread was closed more than a year ago.
@RobAltena looks like this is not a bug.
https://github.com/deeplearning4j/deeplearning4j/issues/4365
The confusion here stems from the fact that I want to pass in a single Time Series Sequence into my trained models and have it predict the label.
Currently calling rnnTimeStep
produces a matrix of probabilities (for each time step).
Using the above work around at least produced a vector with size = nOut
These probabilities sum to 1... I interpreted this to be the probability for each label.
Any guidance you can give to me would be a great help. Thanks :)
Check out the UCISequenceClassificationExample.
@RobAltena thanks. I based my code off this example to begin with. It works great to both train and evaluate on my test data. However when it comes time to use the model in a live
setting.. I am not sure what I can use to pass a single time series in and get the predicted label out. It seems like this is the purpose of the model to begin with.. So I am not sure if I am missing something?
I am still pretty new to RNN so again any help with this would be great.
In summary.. I have a trained model and I simply want to do
rnn.predict( some_time_series_matrix )
and get back one of my labels
It seems like for RNN the correct way to go about this is to in fact call rnn.rnnTimeStep( .. )
correct?
Thanks again :)
Straight from the rnn documentation INDArray timeSeriesFeatures = ...; INDArray timeSeriesOutput = myNetwork.output(timeSeriesFeatures);`
Not sure how I missed that.. Thanks!
If I had set of historical data and then I want prediction for one step at the time and still improve my network should code look like this:
myModel.fit(historicalData)
// assuming newFeatures is infinite stream, for just represents futher steps
for(INDArray feature: newFeatures){
INDArray timeSeriesOutput = myNetwork.output(feature);
if(iCanGetLabelsForNewData){
// if I want predict labels for n step must wait n steps until I have label for new data
myModel.output(featureWithLabel - n, true);
}
}
Is my aproach correct?
it's okay can you send it i will be so thankful dataset and code both and version of python i will run code on it and thank you for your help
On Sun, Apr 5, 2020 at 6:01 PM Tschigger notifications@github.com wrote:
Hey, sorry but I switched to Python/Keras. Good luck though. Regards
Am Sonntag, 5. April 2020, 13:51:51 OESZ hat totaswift15 < notifications@github.com> Folgendes geschrieben:
i am looking for RNN code using python and IOT dataset
contact me for help :tsnim15siddig@gmail.com
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eclipse/deeplearning4j-examples/issues/34#issuecomment-609439403, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOXK7UKJWPCJDJOIL6IED63RLCTOHANCNFSM4BTHED4Q .
Hey, I'm not doing RNNs anymore. My field was time series on financial markets and I realized that running a simple MLP network works much better if you incorporate previous time steps as features. So one input feature is the current timestep, and other input features are older time steps (but summarized and compressed - you have to be creative here). I can neither send you code nor dataset as both are proprietary.
I still hope that helps and gives you an initial spark,Andreas
Am Montag, 6. April 2020, 15:37:20 OESZ hat totaswift15 <notifications@github.com> Folgendes geschrieben:
it's okay can you send it i will be so thankful dataset and code both and version of python i will run code on it and thank you for your help
On Sun, Apr 5, 2020 at 6:01 PM Tschigger notifications@github.com wrote:
Hey, sorry but I switched to Python/Keras. Good luck though. Regards
Am Sonntag, 5. April 2020, 13:51:51 OESZ hat totaswift15 < notifications@github.com> Folgendes geschrieben:
i am looking for RNN code using python and IOT dataset
contact me for help :tsnim15siddig@gmail.com
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eclipse/deeplearning4j-examples/issues/34#issuecomment-609439403, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOXK7UKJWPCJDJOIL6IED63RLCTOHANCNFSM4BTHED4Q .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
i found this but it's not working with me can you help
On Mon, Apr 6, 2020 at 7:24 PM Tschigger notifications@github.com wrote:
Hey, I'm not doing RNNs anymore. My field was time series on financial markets and I realized that running a simple MLP network works much better if you incorporate previous time steps as features. So one input feature is the current timestep, and other input features are older time steps (but summarized and compressed - you have to be creative here). I can neither send you code nor dataset as both are proprietary.
I still hope that helps and gives you an initial spark,Andreas
Am Montag, 6. April 2020, 15:37:20 OESZ hat totaswift15 < notifications@github.com> Folgendes geschrieben:
it's okay can you send it i will be so thankful dataset and code both and version of python i will run code on it and thank you for your help
On Sun, Apr 5, 2020 at 6:01 PM Tschigger notifications@github.com wrote:
Hey, sorry but I switched to Python/Keras. Good luck though. Regards
Am Sonntag, 5. April 2020, 13:51:51 OESZ hat totaswift15 < notifications@github.com> Folgendes geschrieben:
i am looking for RNN code using python and IOT dataset
contact me for help :tsnim15siddig@gmail.com
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/eclipse/deeplearning4j-examples/issues/34#issuecomment-609439403 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AOXK7UKJWPCJDJOIL6IED63RLCTOHANCNFSM4BTHED4Q
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eclipse/deeplearning4j-examples/issues/34#issuecomment-609928574, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOXK7UP4CNGXCUAES2D5P2LRLIF4LANCNFSM4BTHED4Q .
This issue isn't helping anyone anymore.
If you've got additional questions, please ask them on https://community.konduit.ai/.
Hey, I know you are pretty busy right now. But a RNN example with time-series would be awesome if you find time!
Thanks