Jays-code-collection / HMMs_Stock_Market

Contains all code related to using HMMs to predict stock market prices.
GNU General Public License v2.0
247 stars 59 forks source link

Visual graph plot shows "Actual_Close" values into the future. #5

Closed brizzbane closed 3 years ago

brizzbane commented 3 years ago

Running the following:

python stock_analysis.py -n AAPL -s 2018-01-01 -e 2020-12-15 -o ./appl_test/ -p True -f 8 -m True

Results in: AAPLresults_plot

Graph shouldn't be showing "actual_close" for dates that haven't occurred yet, right?


I figured there had to be something crazy going on for the lines to be matching so close (like a bug/mistake). So I just ran the example for myself to see what the graph would look like.

I've yet to dig into the code. I've re-read the README a few times to see if I missed something.


Something else, I have no idea what the following means (I'm a complete newb to this):

-m stands for metrics, and determines whether or not to predict for the historical data, or just for the future days, if True all of the historical data in the test set will be predicted and the Mean Squared Error will be returned

I don't even know what to ask. Maybe my question is, what is there to predict in historical data?

Jays-code-collection commented 3 years ago

Regarding the plot, it could potentially be a bug and the graph is including the 8 extra values. When you open the file named "/companyname_HMM_Predictions_mse.xlsx, are the values at the bottom of the table in the future or up until the day/ day before you ran the code? The plot is based off of those outputs and looking into it will determine if it's a bug or not. To me it looks like the plot has included 01/01/2021 on the x axis but hasn't reached that point for either line. Also 8 days into the future from the 15th wouldn't reach the 1st.

Regarding the metrics part, sorry it doesn't look that clear, I'll explain it, let me know if you understand and how you think it should be reworded?

The model predicts the closing price of a stock given the opening price. For historical data the current opening price is fed in and the close price is predicted. For the future predictions ( -f 8), predicted values are fed into the model one day at a time so that the close prices for the next 8 days can be created and stored. The reason we predict historical data is to get a metric for the quality of the model that cannot be obtained (in that moment) for future predictions as there is no true data to compare the predictions to. If the MSE produced by a model trained on data for 2 years is high, something like 50.5, you'd know to ignore the future predictions generated by that model. In contrast, if the MSE is low, the future predictions are (slightly) more validated.

brizzbane commented 3 years ago

For command written in first post:

python stock_analysis.py -n AAPL -s 2018-01-01 -e 2020-12-15 -o ./appl_test/ -p True -f 8 -m True

AAPL_HMM_Prediction_3.853992.xlsx shows values up until 2020-12-15. AAPL_HMM_Predictions_8_days_in_future.xlsx shows values into the future.

No file with _mse.xlsx ...

I guess I expected "Actual_Close" to be real close stock data. So the line should never be going into the future, and would be -f X amount of days shorter than the prediction line.

So, with the above command, I expected that the graph for "Actual_Close" would not extend past "-e 2020-12-15".

Or is this not what the graph is representing/am I completely missing something? Do you follow what I am saying? :D


I don't know that you should re-word anything because I don't understand it, ha. I have very little knowledge in this area. I certainly got more out of what you just said, but, I am confused by:

For historical data the current opening price is fed in and the close price is predicted. For the future predictions ( -f 8), predicted values are fed into the model one day at a time so that the close prices for the next 8 days can be created and stored.

current opening price meaning.. actual real info from past days.. or no?

...I feel like I should just try to understand what the code is doing/means. Maybe disregard this last part, ha... perhaps I'll be able to ask a better question (or answer it myself), as I dig into what is going on.

This, however, does make me understand it a little better:

The reason we predict historical data is to get a metric for the quality of the model that cannot be obtained (in that moment) for future predictions

Main thing I feel like there is "something wrong" --is that the predicted (graph rendered) is so close to actual values. Then seeing that the "actual values" ("Actual_close") extends into the future, really makes me think graph is not accurate.

Jays-code-collection commented 3 years ago

Sorry, I should have been clearer, 3.853992 = the MSE score. That's what I meant when I said "/companyname_HMM_Predictions_mse.xlsx". So you're right, actual close is in fact the real close stock data. It isn't going into the future, the plot is generated by the same data that you find in the "AAPL_HMM_Prediction_3.853992.xlsx" file, i.e. data up until 2020-12-15. So the plot is behaving as you're expecting: actual close represents the real close prices, and the final point on the plot is 2020-12-15.

Regarding the current opening price for future predictions, that is set by the previous days predicted close price. For these future prices the close price is predicted for the most recent day (in this case the 15th), then that close price is fed forward as the 16th's opening price, and the close price for the 16th is predicted (as any other day would be) and so on until "-f" days are reached. Unfortunately, this isn't a perfect representation of real life as day n's close price is not always equal to day n + 1's opening price due to out of hours changes in price that are the result of futures trading. There's potential for me to include these overnight changes due to futures trading at some point in the future, but it isn't currently part of the code.

To summarise, the plot is generated from the same data you see in AAPL_HMM_Prediction_3.853992.xlsx, and you are correct in thinking that actual close = real close stock data. The plot does not include future predictions, I guess it is just hard to discern where exactly it ends visually.

brizzbane commented 3 years ago

In the graph I posted, the predicted line (red) actually looks shorter than the actual close line (blue). Shouldn't the predicted line (red) extend further than the actual_close line (blue)??

Are you saying--it does, but graph resolution is so--low, that it can't be discerned from it?

...I realized that's probably what you meant with _MSE shortly before seeing your response. Sorry!

Jays-code-collection commented 3 years ago

To me it looks like they end at the same point. I would advise checking the "AAPL_HMM_Prediction_3.853992.xlsx" file and seeing if the final date has a populated entry for the predicted close column. If it does then that is what you are seeing in the plot as it is made from the same dataframe that becomes that excel document.

You're right in thinking that the predicted close should extend beyond the actual close, if I were to include the -f predictions. They arent included in the plot, or in the "AAPL_HMM_Prediction_3.853992.xlsx" spreadsheet as there are no real prices (at the time of generation) to compare them to. I could add them to the plot, do you think that would be useful? My thinking was that the plot isn't detailed enough to provide the exact figures, so it'll be better for a user to just look at the future predictions spreadsheet than try to work out their values by looking at the graph.

brizzbane commented 3 years ago

Just as a user that checks everyday trending repos--graphics are always exciting to see in a project because they show what the app does.

I was expecting, when I tested it out myself, that the prediction line would extend into the future, then I would anxiously wait for time to pass and see if it was close to the graph.

(Sidenote: I then realized I could just test this in the past though. For example have the program calculate 2018-01-01 -e 2020-12-05, extend by 8 days, and just see how it performed VS. real values of 12-05 through 12-13).

I get what you are saying though that graph isn't good to get actual values from.

I don't know, I just figured that since this is something that can predict future prices, that when viewing generated graph--it would have displayed line extending into the future.

Jays-code-collection commented 3 years ago

Yeah I completely understand what you're saying. To be honest, if you check trending repos everyday you probably have a better idea of what's ideal for a readme than me, so I think I'll take you up on the idea of including the extra future predictions in the plot. I had no idea this was going to be seen by as many people as it did when I first posted it haha, so I appreciate the feedback!

brizzbane commented 3 years ago

Congrats on your repo's success! ;)

So, I think, in summary, I was completely wrong that the graph was showing values into the future. My original thinking was that the prediction line would extend into the future (and past the "actual_close" line), so seeing the lines going up to the same date (and with resolution of graph it not being real clear about stop date)--I assumed, wrongly, that graph was drawing lines into the future.

If I am wrong about any of that, please let me know! Feel free to close issue.