davidusb-geek / emhass

emhass: Energy Management for Home Assistant, is a Python module designed to optimize your home energy interfacing with Home Assistant.
MIT License
284 stars 54 forks source link

"perform_backtest": "false" has no effect? #154

Closed g1za closed 6 months ago

g1za commented 8 months ago

Am I wrong or the option to set "perform_backtest" to false has no effect when running forecast-model-fit(and maybe model-tune as well)? Below the output of the log when running the command with true and false:

`2024-01-24 12:18:45,634 - web_server - INFO - Setting up needed data 2024-01-24 12:18:45,652 - web_server - INFO - Retrieve hass get data method initiated... 2024-01-24 12:19:19,329 - web_server - INFO - >> Performing a machine learning forecast model fit... 2024-01-24 12:19:19,331 - web_server - INFO - Performing a forecast model fit for KNN 2024-01-24 12:19:19,338 - web_server - INFO - Training a KNeighborsRegressor model 2024-01-24 12:19:19,369 - web_server - INFO - Elapsed time for model fit: 0.030860185623168945 2024-01-24 12:19:19,403 - web_server - INFO - Prediction R2 score of fitted model on test data: 0.36641199776506206 2024-01-24 12:19:19,405 - web_server - INFO - Performing simple backtesting of fitted model

0%| | 0/26 [00:00<?, ?it/s] 23%|██▎ | 6/26 [00:00<00:00, 59.45it/s] 46%|████▌ | 12/26 [00:00<00:00, 59.73it/s] 73%|███████▎ | 19/26 [00:00<00:00, 59.89it/s] 100%|██████████| 26/26 [00:00<00:00, 63.50it/s] 100%|██████████| 26/26 [00:00<00:00, 62.10it/s] 2024-01-24 12:19:19,828 - web_server - INFO - Elapsed backtesting time: 0.4231910705566406 2024-01-24 12:19:19,829 - web_server - INFO - Backtest R2 score: 0.5533193059764132`

`2024-01-24 12:20:40,409 - web_server - INFO - Setting up needed data 2024-01-24 12:20:40,411 - web_server - INFO - Retrieve hass get data method initiated... 2024-01-24 12:21:14,463 - web_server - INFO - >> Performing a machine learning forecast model fit... 2024-01-24 12:21:14,464 - web_server - INFO - Performing a forecast model fit for KNN 2024-01-24 12:21:14,472 - web_server - INFO - Training a KNeighborsRegressor model 2024-01-24 12:21:14,492 - web_server - INFO - Elapsed time for model fit: 0.020431995391845703 2024-01-24 12:21:14,546 - web_server - INFO - Prediction R2 score of fitted model on test data: 0.3664094027519951 2024-01-24 12:21:14,547 - web_server - INFO - Performing simple backtesting of fitted model

0%| | 0/26 [00:00<?, ?it/s] 15%|█▌ | 4/26 [00:00<00:00, 37.01it/s] 31%|███ | 8/26 [00:00<00:00, 37.80it/s] 46%|████▌ | 12/26 [00:00<00:00, 37.46it/s] 62%|██████▏ | 16/26 [00:00<00:00, 37.52it/s] 77%|███████▋ | 20/26 [00:00<00:00, 37.85it/s] 92%|█████████▏| 24/26 [00:00<00:00, 38.10it/s] 100%|██████████| 26/26 [00:00<00:00, 39.11it/s] 2024-01-24 12:21:15,216 - web_server - INFO - Elapsed backtesting time: 0.6685791015625 2024-01-24 12:21:15,216 - web_server - INFO - Backtest R2 score: 0.5533193059764132`

g1za commented 8 months ago

I tried both with false and False but it doesn't make any difference

davidusb-geek commented 8 months ago

This strange, thanks for reporting this, I will look into it.

davidusb-geek commented 8 months ago

Hi could you please share the commands that you used to obtain these logs?

g1za commented 8 months ago

Hello, I just copied them from the log of the add-on, following command execution. I have the log set to DEBUG mode.

davidusb-geek commented 8 months ago

Yes ok but how are you setting the perform_backtest=True or perform_backtest=False options?

g1za commented 8 months ago

ah apologies... I went for a too strict interpretation of your request :P

I didn't as I really do not understand how to copy text from HA terminal, but I can share a couple of screenshots of the commands.

Screenshot 2024-02-03 alle 12 24 17

Screenshot 2024-02-03 alle 12 24 09

Screenshot 2024-02-03 alle 12 24 25

g1za commented 8 months ago

Thanks OCR :)

curl -i -H "Content-Type:application/json" -X POST -d '{"days_to_retrieve": 28, "model_type": "KNN", "var_model": "sensor.consumption_filtered_w", "sklearn_model": "KNeighborsRegressor", "num_lags": 48, "split_date_delta": "48h", "perform_backtest": "true"}' http://localhost:5000/action/forecast-model-fit

curl -i -H "Content-Type:application/json" -X POST -d '{"days_to_retrieve": 28, "model_type": "KNN", "var_model": "sensor.consumption_filtered_w", "sklearn_model": "KNeighborsRegressor", "num_lags": 48, "split_date_delta": "48h", "perform_backtest": "false"}' http://localhost:5000/action/forecast-model-fit

curl -i -H "Content-Type:application/json" -X POST -d '{"days_to_retrieve": 28, "model_type": "KNN", "var_model": "sensor.consumption_filtered_w", "sklearn_model": "KNeighborsRegressor", "num_lags": 48, "split_date_delta": "48h", "perform_backtest": "False"}' http://localhost:5000/action/forecast-model-fit

g1za commented 8 months ago

I'm sorry but the behaviour is unchanged (I'm running add-on v.0.7.4). I still see the same reported by David: https://github.com/davidusb-geek/emhass/pull/174#issuecomment-1925370505

`2024-02-04 20:44:28,289 - web_server - INFO - Setting up needed data 2024-02-04 20:44:28,290 - web_server - INFO - Retrieve hass get data method initiated... 2024-02-04 20:45:05,242 - web_server - INFO - >> Performing a machine learning forecast model fit... 2024-02-04 20:45:05,243 - web_server - INFO - Performing a forecast model fit for KNN 2024-02-04 20:45:05,251 - web_server - INFO - Training a KNeighborsRegressor model 2024-02-04 20:45:05,282 - web_server - INFO - Elapsed time for model fit: 0.030888795852661133 2024-02-04 20:45:05,317 - web_server - INFO - Prediction R2 score of fitted model on test data: 0.415113443633924 2024-02-04 20:45:05,318 - web_server - INFO - Performing simple backtesting of fitted model 0%| | 0/26 [00:00<?, ?it/s] 23%|██▎ | 6/26 [00:00<00:00, 58.84it/s] 46%|████▌ | 12/26 [00:00<00:00, 59.23it/s] 69%|██████▉ | 18/26 [00:00<00:00, 59.28it/s] 92%|█████████▏| 24/26 [00:00<00:00, 57.44it/s] 100%|██████████| 26/26 [00:00<00:00, 60.06it/s] 2024-02-04 20:45:05,759 - web_server - INFO - Elapsed backtesting time: 0.440044641494751 2024-02-04 20:45:05,759 - web_server - INFO - Backtest R2 score: 0.5544675556498404

2024-02-04 20:45:46,420 - web_server - INFO - Setting up needed data 2024-02-04 20:45:46,422 - web_server - INFO - Retrieve hass get data method initiated... 2024-02-04 20:46:20,988 - web_server - INFO - >> Performing a machine learning forecast model fit... 2024-02-04 20:46:20,990 - web_server - INFO - Performing a forecast model fit for KNN 2024-02-04 20:46:20,997 - web_server - INFO - Training a KNeighborsRegressor model 2024-02-04 20:46:21,017 - web_server - INFO - Elapsed time for model fit: 0.019315719604492188 2024-02-04 20:46:21,069 - web_server - INFO - Prediction R2 score of fitted model on test data: 0.41492724816175786 2024-02-04 20:46:21,071 - web_server - INFO - Performing simple backtesting of fitted model 0%| | 0/26 [00:00<?, ?it/s] 15%|█▌ | 4/26 [00:00<00:00, 33.10it/s] 31%|███ | 8/26 [00:00<00:00, 35.61it/s] 46%|████▌ | 12/26 [00:00<00:00, 35.73it/s] 62%|██████▏ | 16/26 [00:00<00:00, 36.22it/s] 77%|███████▋ | 20/26 [00:00<00:00, 36.64it/s] 92%|█████████▏| 24/26 [00:00<00:00, 37.10it/s] 100%|██████████| 26/26 [00:00<00:00, 37.82it/s] 2024-02-04 20:46:21,762 - web_server - INFO - Elapsed backtesting time: 0.6911623477935791 2024-02-04 20:46:21,762 - web_server - INFO - Backtest R2 score: 0.5544675556498404`

Screenshot 2024-02-04 alle 20 50 45

g1za commented 8 months ago

Ah never mind. I checked the latest commits and I see you reverted right on these bool options, so I think this is not unexpected.

davidusb-geek commented 8 months ago

I'll take a look but normally I didn't came back on the bool treatment

davidusb-geek commented 8 months ago

There bool treatment is right there: https://github.com/davidusb-geek/emhass/blob/809336e48c2dcc7467af38439ef7c5b0db3f25e4/src/emhass/utils.py#L301

g1za commented 8 months ago

Ok so I tried a reboot of the system but the command

curl -i -H "Content-Type:application/json" -X POST -d '{"days_to_retrieve": 28, "model_type": "KNN", "var_model": "sensor.consumption_filtered_w", "sklearn_model": "KNeighborsRegressor", "num_lags": 48, "split_date_delta": "48h", "perform_backtest": "False"}' http://localhost:5000/action/forecast-model-fit Screenshot 2024-02-04 alle 21 20 39

still results into

`2024-02-04 21:17:19,339 - web_server - INFO - Setting up needed data 2024-02-04 21:17:19,341 - web_server - INFO - Retrieve hass get data method initiated... 2024-02-04 21:17:59,834 - web_server - INFO - >> Performing a machine learning forecast model fit... 2024-02-04 21:17:59,835 - web_server - INFO - Performing a forecast model fit for KNN 2024-02-04 21:17:59,843 - web_server - INFO - Training a KNeighborsRegressor model 2024-02-04 21:17:59,878 - web_server - INFO - Elapsed time for model fit: 0.03497457504272461 2024-02-04 21:17:59,912 - web_server - INFO - Prediction R2 score of fitted model on test data: 0.40503499845626323 2024-02-04 21:17:59,914 - web_server - INFO - Performing simple backtesting of fitted model

0%| | 0/26 [00:00<?, ?it/s] 23%|██▎ | 6/26 [00:00<00:00, 59.31it/s] 46%|████▌ | 12/26 [00:00<00:00, 56.69it/s] 73%|███████▎ | 19/26 [00:00<00:00, 58.39it/s] 96%|█████████▌| 25/26 [00:00<00:00, 57.98it/s] 100%|██████████| 26/26 [00:00<00:00, 59.94it/s] 2024-02-04 21:18:00,353 - web_server - INFO - Elapsed backtesting time: 0.4389004707336426 2024-02-04 21:18:00,353 - web_server - INFO - Backtest R2 score: 0.5591832851357621`

(for me, at least)

davidusb-geek commented 8 months ago

Yes you are right. I've just tested and is wrong again. But I've just found why. You are right, looking at the I did reverted some of the fixes inside the MLForecaster code. Fixing it and now and testing...

g1za commented 8 months ago

Fixed with v.0.6.4 Thanks

g1za commented 7 months ago

I previously tested with model-fit, but is it maybe still not fixed with model-tune?

This command

curl -i -H "Content-Type:application/json" -X POST -d '{"days_to_retrieve": 28, "model_type": "KNN", "var_model": "sensor.consumption_filtered_w", "sklearn_model": "KNeighborsRegressor", "num_lags": 48, "split_date_delta": "48h", "perform_backtest": "False"}' http://localhost:5000/action/forecast-model-tune

results into

2024-02-19 01:14:16,439 - web_server - INFO - Setting up needed data
2024-02-19 01:14:16,441 - web_server - INFO - Retrieve hass get data method initiated...
2024-02-19 01:14:56,605 - web_server - INFO -  >> Performing a machine learning forecast model tune...
2024-02-19 01:14:56,606 - web_server - INFO - Bayesian hyperparameter optimization with backtesting
`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72] 
  Parameters: {'n_neighbors': 10, 'leaf_size': 21, 'weights': 'distance'}
  Backtesting metric: 0.112468066613576

Number of models compared: 70,
         10 bayesian search in each lag configuration.

lags grid:   0%|          | 0/7 [00:00<?, ?it/s]
lags grid:  14%|█▍        | 1/7 [00:01<00:08,  1.46s/it]
lags grid:  29%|██▊       | 2/7 [00:02<00:05,  1.02s/it]
lags grid:  43%|████▎     | 3/7 [00:02<00:03,  1.16it/s]
lags grid:  57%|█████▋    | 4/7 [00:03<00:02,  1.20it/s]
lags grid:  71%|███████▏  | 5/7 [00:04<00:01,  1.19it/s]
lags grid:  86%|████████▌ | 6/7 [00:05<00:00,  1.19it/s]
lags grid: 100%|██████████| 7/7 [00:06<00:00,  1.19it/s]
lags grid: 100%|██████████| 7/7 [00:06<00:00,  1.14it/s]
2024-02-19 01:15:02,839 - web_server - INFO - Elapsed time: 6.232482671737671
2024-02-19 01:15:02,894 - web_server - INFO - R2 score for optimized prediction in train period: -0.11664029412833621
2024-02-19 01:15:02,896 - web_server - INFO - R2 score for optimized prediction in test period: 0.4610334716090674
2024-02-19 01:15:02,896 - web_server - INFO - Number of optimal lags obtained: 72

Based on the log above, what do you think? Thanks!

davidusb-geek commented 6 months ago

Hi. Sorry about a very late answer to this. This is solved, that perform_backtest control parameter only makes sense for model-fit. The model-tune always performs backtest because it is needed for hyperparameters optimization. Closing...