Jays-code-collection / HMMs_Stock_Market

Contains all code related to using HMMs to predict stock market prices.
GNU General Public License v2.0
249 stars 59 forks source link

ValueError: Input contains NaN issue from total_data = np.row_stack((previous_data_features, possible_outcome)) #10

Open Originn opened 1 year ago

Originn commented 1 year ago

Hi, many thanks for this.

When running the code it fails always on the 2/5

Predicting future Close prices from 2023-01-18 00:00:00 to 2023-01-22 00:00:00
 40%|██████████████████████████████████                                                   | 2/5 [00:03<00:05,  1.97s/it]

seems that total_data contains NaN values and the score method don't like it. Any idea on how to fix this?

I have switched to downloading the data using yfinance, all the columns have no NaNs so it's not the root of the issue. Is there something I am missing that may cause the ValueError?

Training data period is from 2018-01-02 00:00:00 to 2021-05-18 00:00:00
2023-01-18 10:04:46,223 __main__     INFO     >>> Extracting Features
2023-01-18 10:04:46,223 __main__     INFO     Features extraction Completed <<<
Predicting Close prices from 2021-05-19 00:00:00 to 2023-01-17 00:00:00
100%|█████████████████████████████████████████████████████████████████████████████████| 419/419 [09:17<00:00,  1.33s/it]
All predictions saved. The Mean Squared Error for the 419 days considered is: 6.657134688795405
Predicting future Close prices from 2023-01-18 00:00:00 to 2023-01-22 00:00:00
 40%|██████████████████████████████████                                                   | 2/5 [00:03<00:05,  1.97s/it]
Traceback (most recent call last):
  File "/home/originn/HMMs_Stock_Market/src/stock_analysis.py", line 470, in <module>
    main()
  File "/home/originn/HMMs_Stock_Market/src/stock_analysis.py", line 465, in main
    use_stock_predictor(company_name, start, end, future, metrics, plot, out_dir)
  File "/home/originn/HMMs_Stock_Market/src/stock_analysis.py", line 354, in use_stock_predictor
    future_pred_close = stock_predictor.predict_close_prices_for_future()
  File "/home/originn/HMMs_Stock_Market/src/stock_analysis.py", line 248, in predict_close_prices_for_future
    predicted_close_prices.append(self.predict_close_price_fut_days(day_index))
  File "/home/originn/HMMs_Stock_Market/src/stock_analysis.py", line 223, in predict_close_price_fut_days
    ) = self._get_most_probable_outcome(day_index)
  File "/home/originn/HMMs_Stock_Market/src/stock_analysis.py", line 149, in _get_most_probable_outcome
    outcome_score.append(self.hmm.score(total_data))
  File "/home/originn/HMMs_Stock_Market/venv/lib/python3.10/site-packages/hmmlearn/base.py", line 259, in score
    return self._score(X, lengths, compute_posteriors=False)[0]
  File "/home/originn/HMMs_Stock_Market/venv/lib/python3.10/site-packages/hmmlearn/base.py", line 272, in _score
    X = check_array(X)
  File "/home/originn/HMMs_Stock_Market/venv/lib/python3.10/site-packages/sklearn/utils/validation.py", line 919, in check_array
    _assert_all_finite(
  File "/home/originn/HMMs_Stock_Market/venv/lib/python3.10/site-packages/sklearn/utils/validation.py", line 161, in _assert_all_finite
    raise ValueError(msg_err)
ValueError: Input contains NaN.

Shouldn't in def predict_close_price_fut_days(self, day_index) the self.days be incemented?

zx2214008 commented 1 year ago

Hi, many thanks for this.

When running the code it fails always on the 2/5

Predicting future Close prices from 2023-01-18 00:00:00 to 2023-01-22 00:00:00
 40%|██████████████████████████████████                                                   | 2/5 [00:03<00:05,  1.97s/it]

seems that total_data contains NaN values and the score method don't like it. Any idea on how to fix this?

I have switched to downloading the data using yfinance, all the columns have no NaNs so it's not the root of the issue. Is there something I am missing that may cause the ValueError?

Training data period is from 2018-01-02 00:00:00 to 2021-05-18 00:00:00
2023-01-18 10:04:46,223 __main__     INFO     >>> Extracting Features
2023-01-18 10:04:46,223 __main__     INFO     Features extraction Completed <<<
Predicting Close prices from 2021-05-19 00:00:00 to 2023-01-17 00:00:00
100%|█████████████████████████████████████████████████████████████████████████████████| 419/419 [09:17<00:00,  1.33s/it]
All predictions saved. The Mean Squared Error for the 419 days considered is: 6.657134688795405
Predicting future Close prices from 2023-01-18 00:00:00 to 2023-01-22 00:00:00
 40%|██████████████████████████████████                                                   | 2/5 [00:03<00:05,  1.97s/it]
Traceback (most recent call last):
  File "/home/originn/HMMs_Stock_Market/src/stock_analysis.py", line 470, in <module>
    main()
  File "/home/originn/HMMs_Stock_Market/src/stock_analysis.py", line 465, in main
    use_stock_predictor(company_name, start, end, future, metrics, plot, out_dir)
  File "/home/originn/HMMs_Stock_Market/src/stock_analysis.py", line 354, in use_stock_predictor
    future_pred_close = stock_predictor.predict_close_prices_for_future()
  File "/home/originn/HMMs_Stock_Market/src/stock_analysis.py", line 248, in predict_close_prices_for_future
    predicted_close_prices.append(self.predict_close_price_fut_days(day_index))
  File "/home/originn/HMMs_Stock_Market/src/stock_analysis.py", line 223, in predict_close_price_fut_days
    ) = self._get_most_probable_outcome(day_index)
  File "/home/originn/HMMs_Stock_Market/src/stock_analysis.py", line 149, in _get_most_probable_outcome
    outcome_score.append(self.hmm.score(total_data))
  File "/home/originn/HMMs_Stock_Market/venv/lib/python3.10/site-packages/hmmlearn/base.py", line 259, in score
    return self._score(X, lengths, compute_posteriors=False)[0]
  File "/home/originn/HMMs_Stock_Market/venv/lib/python3.10/site-packages/hmmlearn/base.py", line 272, in _score
    X = check_array(X)
  File "/home/originn/HMMs_Stock_Market/venv/lib/python3.10/site-packages/sklearn/utils/validation.py", line 919, in check_array
    _assert_all_finite(
  File "/home/originn/HMMs_Stock_Market/venv/lib/python3.10/site-packages/sklearn/utils/validation.py", line 161, in _assert_all_finite
    raise ValueError(msg_err)
ValueError: Input contains NaN.

Shouldn't in def predict_close_price_fut_days(self, day_index) the self.days be incemented?

I have the same issue as you and fixed it. Change the code from the line 194 to 196 as follows:

second_df = pd.DataFrame(index=future_dates, columns=["Open", "High", "Low", "Close"])

The wrong order make iloc could not local the right cell to replace the opening price for the first day in the future.