Closed gieljnssns closed 2 months ago
Attention: Patch coverage is 71.63995%
with 230 lines
in your changes are missing coverage. Please review.
Project coverage is 80.52%. Comparing base (
e192d97
) to head (e154380
). Report is 50 commits behind head on master.:exclamation: Current head e154380 differs from pull request most recent head f8b43aa. Consider uploading reports for the commit f8b43aa to get more accurate results
Files | Patch % | Lines |
---|---|---|
src/emhass/command_line.py | 61.69% | 95 Missing :warning: |
src/emhass/machine_learning_regressor.py | 21.10% | 86 Missing :warning: |
src/emhass/utils.py | 90.47% | 36 Missing :warning: |
src/emhass/retrieve_hass.py | 82.89% | 13 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hi could you please explain a little bit more your use case here? What are you using these predictions for? Also they are not time series predictions like we do in the ML forecaster, these are just regressions. It wouldn't be better to include these as another method in the ML forecast class? What passing the data directly from sensors in Home Assistant?
This need some unittest, otherwise we don't if it's breaking something.
I have an automation in HA that stores some daily data in a csv file
alias: "Heating csv"
id: 157b1d57-73d9-4f39-82c6-13ce0cf4288a
trigger:
- platform: time
at: "23:59:32"
action:
- service: notify.prediction
data:
message: >
{% set dd = states('sensor.degree_day_daily') |float %}
{% set inside = states('sensor.gemiddelde_dagtemperatuur_binnen') |float %}
{% set outside = states('sensor.gemiddelde_dagtemperatuur_buiten') |float %}
{% set hour = states('sensor.branduren_warmtepomp_vandaag') |float | round(2) %}
{% set kwhdd = states('sensor.kwh_per_degree_day_daily') |float %}
{% set hourdd = states('sensor.uur_per_degree_day_daily') |float | round(2) %}
{% set solar_total = states('sensor.opbrengst_kwh') |float %}
{% set solar_total_yesterday = states('sensor.solar_csv_2') |float %}
{% set solar = (states('sensor.opbrengst_kwh') |float - solar_total_yesterday) | round(3) %}
{% set verwarming_total = states('sensor.warmtepomp_kwh') |float %}
{% set verwarming_total_yesterday = states('sensor.verwarming_csv') |float %}
{% set verwarming = (states('sensor.warmtepomp_kwh') |float - verwarming_total_yesterday) | round(3) %}
{% set verbruik_total = states('sensor.verbruik_kwh') |float %}
{% set verbruik_total_yesterday = states('sensor.verbruik_csv') |float %}
{% set verbruik = (states('sensor.verbruik_kwh') |float - verbruik_total_yesterday) | round(3) %}
{% set verbruik_zonder_verwarming = (verbruik - verwarming) | round(3) %}
{% set time = now() %}
{{time}},{{dd}},{{solar}},{{verbruik_zonder_verwarming}},{{hourdd}},{{inside}},{{outside}},{{hour}},{{kwhdd}},{{solar_total}},{{verwarming_total}},{{verwarming}},{{verbruik_total}},{{verbruik}}
where hour
is the number of hours my heating has been on
solar
is the amount of solar energy produced
and dd
are the degree days of that day
I'm trying to get as much data as I can
I know the solar for the next day (solcast) and I can calculate the degree days for the next day (based on temperature predictions)
Then I want to predict the number of hours my heating should be on the next day, so I can set the def_total_hours
for my heating
I hope when I have data over one year the predictions of hour wil be good enough to eliminate my thermostat.
Ok I understand better now. You have two regressor to train. A first regression to output your degree days using an available forecast of your local temperature, then a second regressor using this degree days along with your available solar forecast from solcast to output the needed number of hours for your heating the next day. Is that it? This seems interesting. A nice feature.
Please consider adding a unittest in the test
folder to test this new class, otherwise we don't if it's breaking something. This will be necessary to merge this into the final code.
Like I said I would have make this more generic. Your use case is with CSV files (which I personally like), but may not be the case for most people. I think that this should support other types of data input. Like directly specifying the name of sensors in Home Assistant and then retrieven the data directly like we do for the energy optimization.
If this is to be more generic then the name of the new can be something like MLPredictor
and machine_learning_predictor.py
.
What do you think?
These is to keep coherence with the rest of the code.
Maybe you can finish what you started (including the unittest) and then after the merge I can refactor the names.
Thanks again for this feature
I will work on this hopefully next week again...
Of course, keep this up, it is a very nice new feature.
Please consider adding a unittest in the test folder to test this new class, otherwise we don't if it's breaking something.
I do not have any experience whit unittest, I will try to find this out
I'm also having an issue with pytest
vscode ➜ /workspaces/emhass (master) $ pytest
====================================================================================== test session starts =======================================================================================
platform linux -- Python 3.11.4, pytest-7.3.1, pluggy-1.0.0
rootdir: /workspaces/emhass
plugins: requests-mock-1.11.0
collected 0 items / 6 errors
============================================================================================= ERRORS =============================================================================================
_______________________________________________________________________ ERROR collecting tests/test_command_line_utils.py ________________________________________________________________________
ImportError while importing test module '/workspaces/emhass/tests/test_command_line_utils.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/lib/python3.11/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_command_line_utils.py:9: in <module>
from emhass.command_line import set_input_data_dict
E ModuleNotFoundError: No module named 'emhass'
____________________________________________________________________________ ERROR collecting tests/test_forecast.py _____________________________________________________________________________
ImportError while importing test module '/workspaces/emhass/tests/test_forecast.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/lib/python3.11/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_forecast.py:11: in <module>
from emhass.retrieve_hass import retrieve_hass
E ModuleNotFoundError: No module named 'emhass'
___________________________________________________________________ ERROR collecting tests/test_machine_learning_forecaster.py ___________________________________________________________________
ImportError while importing test module '/workspaces/emhass/tests/test_machine_learning_forecaster.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/lib/python3.11/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_machine_learning_forecaster.py:16: in <module>
from emhass.command_line import set_input_data_dict
E ModuleNotFoundError: No module named 'emhass'
__________________________________________________________________________ ERROR collecting tests/test_optimization.py ___________________________________________________________________________
ImportError while importing test module '/workspaces/emhass/tests/test_optimization.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/lib/python3.11/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_optimization.py:11: in <module>
from emhass.retrieve_hass import retrieve_hass
E ModuleNotFoundError: No module named 'emhass'
__________________________________________________________________________ ERROR collecting tests/test_retrieve_hass.py __________________________________________________________________________
ImportError while importing test module '/workspaces/emhass/tests/test_retrieve_hass.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/lib/python3.11/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_retrieve_hass.py:12: in <module>
from emhass.retrieve_hass import retrieve_hass
E ModuleNotFoundError: No module named 'emhass'
______________________________________________________________________________ ERROR collecting tests/test_utils.py ______________________________________________________________________________
ImportError while importing test module '/workspaces/emhass/tests/test_utils.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/lib/python3.11/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/test_utils.py:8: in <module>
from emhass import utils
E ModuleNotFoundError: No module named 'emhass'
==================================================================================== short test summary info =====================================================================================
ERROR tests/test_command_line_utils.py
ERROR tests/test_forecast.py
ERROR tests/test_machine_learning_forecaster.py
ERROR tests/test_optimization.py
ERROR tests/test_retrieve_hass.py
ERROR tests/test_utils.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 6 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================================================================================= 6 errors in 0.74s ========================================================================================
vscode ➜ /workspaces/emhass (master) $
How can i solve this?
Please consider adding a unittest in the test folder to test this new class, otherwise we don't if it's breaking something.
I do not have any experience whit unittest, I will try to find this out
Follow the same procedure as this: https://github.com/davidusb-geek/emhass/blob/master/tests/test_machine_learning_forecaster.py
I'm also having an issue with
pytest
How can i solve this?
What's your dev environment? You need to install emhass as an editable package in your environment.
python -m pip install -e .
What's your dev environment?
I re-open in devcontainer (vscode)
I change in web_server this lines
CONFIG_PATH = os.getenv("CONFIG_PATH", default="/app/config_emhass.yaml")
to CONFIG_PATH = os.getenv("CONFIG_PATH", default="/workspaces/emhass/app/config_emhass.yaml")
DATA_PATH = os.getenv("DATA_PATH", default="/app/data/")
to DATA_PATH = os.getenv("DATA_PATH", default="/workspaces/emhass/app/data/")
and with open(os.getenv('SECRETS_PATH', default='/app/secrets_emhass.yaml'), 'r') as file:
to with open(os.getenv('SECRETS_PATH', default='/workspaces/emhass/app/secrets_emhass.yaml'), 'r') as file:
And then cd src && python -m emhass.web_server
I you may need to rebase your branch based on the latest code from master
I have seen it. Is there a better way to develop on emhass then the way I described above?
I'm not using codespaces. I do everything locally on my PC. To do it locally on a Linux machine do this:
python -m venv .venv
(here the virtual environment is called .venv
)source .venv/bin/activate
(if using Windows instead this would be .venv\Scripts\activate.bat
git clone
the EMHASS repository and from within the repo and with the virtual env activated just install the emhass
package in editable mode with this: python -m pip install -e .
Hope it helps
EDIT: I've actually just tested this same procedure inside the codespaces and it works perfectly
I have seen it. Is there a better way to develop on emhass then the way I described above?
My method of testing is slightly different (probably more complicated) from @davidusb-geek
My workflow: (on VS-Code):
emhass
) foulder in VS-Code control+shift+p
> Tasks: Run Task
> EMHASS Install
. This has been set up in the tasks.json file. I will re run this every time I have done made change to emhass before run & debug.Run and Debug
tab (Ctrl+Shift+D.) > EMHASS run Addon
. This has been setup in the Launch.json. You will need to modify the url
(http://IPHERE:PORT/) and key
(PLACEKEYHERE) before running to match your HA environment.Testing
tab on the left hand sideKeep in mind there are few changes I will do in order to make this method work:
Can view these changes here: https://github.com/davidusb-geek/emhass/compare/master...GeoDerp:emhass:testing_eviroment In theory this could be merged into master. @davidusb-geek if you will like that let me know.
If you like to open the 5000 port up so you test on remote devices you can also do this: https://github.com/davidusb-geek/emhass/commit/0370cab8d10ae6c6b4c14e76737e1dd6bc363183 (just un-comment). Do this at your own discretion.
Need to try out @davidusb-geek's method at some point.
If you like to try out another alternative, have a look at this: https://github.com/davidusb-geek/emhass/pull/182
@davidusb-geek I did some changes, still no tests or documentation. But at first maybe you can have a look at it and give me some comments...
Good job on keeping your pull request up to date.
This are the new rest commands
fit_heating_hours:
url: http://localhost:5001/action/csv-model-fit
method: POST
content_type: "application/json"
payload: >-
{
"csv_file": "prediction.csv",
"independent_variables":["degreeday", "solar"],
"dependent_variable": "hours",
"model_type": "heating_dd",
"timestamp": "timestamp",
"date_features": ["month", "day_of_week"]
}
predict_heating_hours:
url: http://localhost:5001/action/csv-model-predict
method: POST
content_type: "application/json"
payload: >-
{
"csv_predict_entity_id": "sensor.predicted_hours",
"csv_predict_unit_of_measurement": "h",
"csv_predict_friendly_name": "Predicted hours",
"new_values": [8.2, 7.23, 2, 6],
"model_type": "heating_dd"
}
If you have a column in your csv file that contains a timestamp, you can pass that column name.
And if you have a timestamp you can use date_features
, by passing the ones you care about the model they are taken into account with the fit action.
the possibilities are year
, month
, day_of_week
, day_of_year
, day
and hour
@davidusb-geek Do you have any feedback on the changes I have made?
Hi yes of course sorry, here are some comments.
It is a nice feature. Here are some comments. This class should work for multiple types of input data, not only CSV files. The main workflow should be retrieving the data directly from HA using the same methods as emhass does for the optimization. (we can make this after merging your code with some later refactoring)
Then the name of the class can be changed to something like MLRegressor
. That will be consistent with the MLForecaster
class.
Then there are the models, I only see linear regression, but now that you have put together a pipeline you can go ahead and add a list of different ML models with their parameters and try to find the best. You can add lasso, random forest, etc.
The docstring of the main class is confusing on the example for the dependent variable, hours? Also it is typical in data science to name the dependent variable as the target and the independent variables as features.
We may make use of the more efficient bayesian optimization already available within emhass to optimize the hyperparameters. But can see this later, gridSearchCV is a very good start.
Here is a code snippet from chat gpt for multiple models, so needs testing ;-) . Store the results and pick the best model with lowest error:
regression_methods = [
('Linear Regression', LinearRegression(), {}),
('Ridge Regression', Ridge(), {'ridge__alpha': [0.1, 1.0, 10.0]}),
('Lasso Regression', Lasso(), {'lasso__alpha': [0.1, 1.0, 10.0]}),
('Random Forest Regression', RandomForestRegressor(), {'randomforestregressor__n_estimators': [50, 100, 200]}),
('Gradient Boosting Regression', GradientBoostingRegressor(), {
'gradientboostingregressor__n_estimators': [50, 100, 200],
'gradientboostingregressor__learning_rate': [0.01, 0.1, 0.2]
}),
('AdaBoost Regression', AdaBoostRegressor(), {
'adaboostregressor__n_estimators': [50, 100, 200],
'adaboostregressor__learning_rate': [0.01, 0.1, 0.2]
})
]
for name, model, param_grid in regression_methods:
pipeline = Pipeline([
('scaler', StandardScaler()),
(name, model)
])
# Use GridSearchCV to find the best hyperparameters for each model
grid_search = GridSearchCV(pipeline, param_grid, scoring='neg_mean_squared_error', cv=5)
grid_search.fit(X_train, y_train)
# Get the best model and print its mean squared error on the test set
best_model = grid_search.best_estimator_
predictions = best_model.predict(X_test)
Hi @gieljnssns. Have you tested this? It is giving you good regression results for your test csv data? Good metrics r2? Please keep your branch updated with master, currently there are some conflicts
I haven't had the time to test this. Is it good enough to keep this PR for csv files only and add support for HA sensors later?
I haven't had the time to test this. Is it good enough to keep this PR for csv files only and add support for HA sensors later?
Yes we could merge with just csv and refactor later for HA sensors.
I don't know if it is good enough that why I was asking if it is giving you good regression results with your test data.
@GeoDerp I still haven't had the time to work further on this. But with one of the last changes the emhass files are stored in the emhass container. I always let home assistant save my csv file in the 'share' folder because emhass can read files there, is this still possible?
@GeoDerp I still haven't had the time to work further on this. But with one of the last changes the emhass files are stored in the emhass container. I always let home assistant save my csv file in the 'share' folder because emhass can read files there, is this still possible?
It's possible that we could set up a parameter that allows us to specify the what data path, inside of options.json.
E.x. set up a parameter then check if that parameter exists. If so replace DATA_PATH.
If I have some energy and time I'll have a look at this tonight.
Good idea to avoid a breaking change for people already using the share folder as @gieljnssns
@davidusb-geek I did some of the changes you've requested. Do you have more comments before I start on tests and documentation?
fit_heating_hours:
url: http://127.0.0.1:5000/action/regressor-model-fit
method: POST
content_type: "application/json"
payload: >-
{
"csv_file": "prediction.csv",
"features":["degreeday", "solar"],
"target": "hours",
"regression_model": "RandomForestRegression",
"model_type": "heating_dd",
"timestamp": "timestamp",
"date_features": ["month", "day_of_week"]
}
predict_heating_hours:
url: http://localhost:5001/action/regressor-model-predict
method: POST
content_type: "application/json"
payload: >-
{
"mlr_predict_entity_id": "sensor.predicted_hours",
"mlr_predict_unit_of_measurement": "h",
"mlr_predict_friendly_name": "Predicted hours",
"new_values": [8.2, 7.23, 2, 6],
"model_type": "heating_dd"
}
It seems very nice now. Some comments below:
First important question: does it work? I mean for your test data do you have good results? What are the r2 metrics results?
Use a pipeline as in the code snippet that I post here before using a scaling method like this:
pipeline = Pipeline([
('scaler', StandardScaler()),
(name, model)
])
Very often scaling will help improve performance of model regression.
After this you will be good to go for unit testing. Follow he same procedure as in emhass/tests/test_machine_learning_forecaster.py
First important question: does it work? I mean for your test data do you have good results? What are the r2 metrics results?
I have data for 129 days. This are my test results
http://127.0.0.1:5000/action/regressor-model-fit:
csv_file: "prediction.csv"
features: ["dd", "solar"]
target: "hour"
regression_model: ""
model_type: "heating_dd"
timestamp: "timestamp"
date_features: ["month", "day_of_week"]
http://127.0.0.1:5000/action/regressor-model-predict:
mlr_predict_entity_id: "sensor.voorspelde_uren_test"
mlr_predict_unit_of_measurement: "h"
mlr_predict_friendly_name: "Voorspelde uren"
new_values: [12.79, 4.766, 1, 2]
model_type: "heating_dd"
Should be: 3.73
LinearRegression:
elapsed_time: 6.190458059310913
r2_score: 0.36003284921146106
prediction: 4.522378116710202
RidgeRegression:
elapsed_time: 0.5306239128112793
r2_score: 0.3870962809623618
prediction: 4.453366967305042
LassoRegression:
elapsed_time: 0.06030106544494629
r2_score: 0.448217948859876
prediction: 4.373201692352232
RandomForestRegression:
elapsed_time: 0.49344754219055176
r2_score: 0.22409991314413769
prediction: 3.631450000000007
GradientBoostingRegression:
elapsed_time: 0.3390817642211914
r2_score: 0.34719210710129045
prediction: 3.7482900730947586
AdaBoostRegression:
elapsed_time: 0.8152174949645996
r2_score: 0.40009590862339695
prediction: 3.5960526315789467
Use a pipeline as in the code snippet that I post here before using a scaling method like this:
With the use of Pipeline()
I always running into
ValueError: Invalid parameter 'adaboostregressor' for estimator Pipeline(steps=[('scaler', StandardScaler()), ('name', AdaBoostRegressor())]). Valid parameters are: ['memory', 'steps', 'verbose'].
But in my PR I use
self.model = make_pipeline(StandardScaler(), base_model)
That goes well
@gieljnssns, happy to help you out with the path changes that just merged. May be good to see @davidusb-geek 's response to testing that PR on the weekend first however. (just in case it gets reverted)
Note. This change will override that github-advanced-security warning.
@gieljnssns, happy to help you out with the path changes that just merged. May be good to see @davidusb-geek 's response to testing that PR on the weekend first however. (just in case it gets reverted)
Hi, I merged #247 yesterday and unit tests are passing correctly, so everything looks good to me.
How can I fix the CodeQL error?
How can I fix the CodeQL error?
Yes these have been hanging for some time now. We need to fix them. They come from using the eval
function directly.
I haven't come up with a solution to this, ideas are welcomed
@davidusb-geek Are you open to use ruff as the standard formatter?
@davidusb-geek Are you open to use ruff as the standard formatter?
What is ruff?
I propose a solution otherwise, change those eval
with ast.literal_eval
:
import ast
user_input = input("Enter a Python expression: ")
result = ast.literal_eval(user_input)
Needs testing.
What is ruff?
https://github.com/astral-sh/ruff
This is also the default formatter in Home Assistant
Hey @gieljnssns your PR file change seems to be a little weird (showing file changes to all master merged commits) I experienced this myself yesterday. (just tested the result with https://github.com/davidusb-geek/emhass/pull/259)
Could you see if this changes anything:
git remote add david git@github.com:davidusb-geek/emhass.git
git pull --tags david master
Then a git push
Feel free to do this on a test bed first to make sure you don't mess anything up. Amazing Job by the way. Will validate the path changes as well as do the pipeline (standalone & addon) tests on this code later tonight hopefully.
Hey @gieljnssns your PR file change seems to be a little weird (showing file changes to all master merged commits) I experienced this myself yesterday. (just tested the result with #259)
I was about to comment on this myself when I saw the number of files changed = 49!
https://github.com/astral-sh/ruff
This is also the default formatter in Home Assistant
Got it, yes open to anything that will make this better.
to make sure you don't mess anything up.
I think this already happened.
vscode ➜ /workspaces/emhass (master) $ git remote add david git@github.com:davidusb-geek/emhass.git
vscode ➜ /workspaces/emhass (master) $ git pull --tags david master
Warning: Permanently added the ECDSA host key for IP address '140.82.121.4' to the list of known hosts.
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Hey @gieljnssns your PR file change seems to be a little weird (showing file changes to all master merged commits) I experienced this myself yesterday. (just tested the result with #259)
I was about to comment on this myself when I saw the number of files changed = 49!
Stange
vscode ➜ /workspaces/emhass (master) $ git remote add david git@github.com:davidusb-geek/emhass.git vscode ➜ /workspaces/emhass (master) $ git pull --tags david master Warning: Permanently added the ECDSA host key for IP address '140.82.121.4' to the list of known hosts. git@github.com: Permission denied (publickey). fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.
Sorry that's because it's GitHub ssh and not https
git remote add david git@github.com:davidusb-geek/emhass.git
To
git remote add david https://github.com/davidusb-geek/emhass.git
Hey @gieljnssns your PR file change seems to be a little weird (showing file changes to all master merged commits) I experienced this myself yesterday. (just tested the result with #259)
I was about to comment on this myself when I saw the number of files changed = 49!
Stange
It might be a new GitHub glitch. 🤷♂️
The story behind this pull request. I keep a CSV file in which I store data from which I want to predict the number of heating hours. I first tried to do this via a custom_component for home-assistant, but apparently it is not possible to install
scikit-learn
. Since the result of my prediction is to be used in emhass and the necessary dependencies are already installed in emhass, I decided to go this way.This pull request contains a new method
csv-predict
with new parameters, here is an example of a rest command in home-assistantIf you are open to accepting this pull request, I will also take the time to write some documentation. And if necessary, I would also like to try writing some tests.
Here is also a used CSV file. prediction.csv