Open BasvanH opened 5 years ago
Hi @BasvanH thanks for the great question
See this issue regarding time-series datasets https://github.com/RubixML/RubixML/issues/35 - in short, we do not directly support time-series data yet
Since stock price is non-stationary, you will get best results from an algorithm that directly supports time series
Having that said ...
Supervised learners such as classifiers and regressors require a training signal in the form of labels
Unsupervised learner such as clusterers and anomaly detectors do not require labels
Your problem can be viewed as a classification one, in which the prediction will be trend 'up' or 'down,' or a regression problem where the prediction is the direction (+/-) and degree of trend from a baseline (ex. 0).
Your problem can also potentially fit into a clustering one, in which case, you can try to isolate clusters of up and down trend. You can also use an anomaly detector to predict when a stock is abnormally trending up or down.
So there are multiple ways, and also combinations of methods, that you can go about building a stock predicting system. I would avoid the unsupervised methods for now and focus on the supervised methods to start. Again, you will need a good Labeled dataset.
Are you able to automate the labeling process in any way?
Can you discretize the 'price' variable such that, if it is above a rolling (windowed) average, the label will be 'up' and in contradistinction 'down' if it is below the moving average?
Hi @andrewdalpino,
Thank you for taking the time to write such a detailed answer, much appreciated!
I have PHP experience, so therefore I have chosen your library as I think it's the most enhanced and complete one in PHP. Looking at other libraries in other languages would mean much more time for me to learn ML. So I will stick with you despite not having the time based algorithm yet :-) . You already done a great job!
So labeling is the way to go. Yes, I can process the history with a moving average, and determine trend based on price be up or below. I will move ahead and write this part.
First I want to start relatively simple, so with a classifier. Do you have an advice in which one to use?
@BasvanH No problem, welcome to our community!
Are you able to obtain more features for your dataset or do you just have the 3 that you mentioned?
How many samples do you have?
I would recommend starting with either Logistic Regression or Random Forest.
Logistic Regression is a simple linear classifier that has an associated tutorial here. The nice thing about Logistic Regression is that it can be partially trained (implements the Online interface) - thus, you can train it with new data as soon as it comes in. This will help the model to compensate for the fact that the data is non-stationary.
Random Forest is a non-linear ensemble method that you can try if you need a more flexible model.
Once you have enough labeled data, make sure to set about 20% of it aside to use as testing data and validate your model. The F Beta metric will give you a good idea as to how well it performs.
My dataset has a datapoints every full minute and contains the following features:
I calculate SMA on each datapoint based on 30 datapoints/minutes ahead.
I'm adding trend to my dataset:
I will use trend
as my label, but I'm also interested if it would make sense to add the difference as a label to indicate how much up or down trend we are having.
I have my dataset ready, and I'm going to move forward to read into Logistic Regression classifier.
Looks like you are well on your way @BasvanH
Keep us updated with your progress and don't hesitate to follow up with questions
Also, given the recent interest (https://github.com/RubixML/RubixML/issues/40, https://github.com/RubixML/RubixML/issues/35), we may start implementing time series features if they will better serve our users
Hey @BasvanH did you have any luck with this? I'm trying to also use the RandomForest algo but the link above is broken and I did not find any examples using this algo on any demo pages.
Hey @BasvanH did you have any luck with this? I'm trying to also use the RandomForest algo but the link above is broken and I did not find any examples using this algo on any demo pages.
Here is a link to the current Random Forest documentation
https://docs.rubixml.com/latest/classifiers/random-forest.html
Hello,
I'm starting with ML and trying to predict stock trend up or down based on stock history. There are two challenges which I cannot seem to solve at the moment.
I have my stock history, this is data containing the price, volume and amount of trades at a certain point of time. I think I need to class this as Unlabeled data as I have not labeled them what trend a certain datapoint is in. Am I correct in this? When training the history data I get a warning it's missing labels. So I'm kind of lost how to handle/train unlabeled data.
Secondly, a timeline is also in play. I do not know how to handle this in the library.
Any help is much appreciated.
Thanks, Bastiaan