Closed bdzyubak closed 2 months ago
Used a gridsearchcv which is natively integrated with MLflow to do a tiered search of hyperparameters.
Predictions are substantially improved with respect to the baseline. Closing for now until a specific RMSE (or other metric) target is defined for this project
The following metrics have been added. Mean Error and Mean Percent Error are probably the most useful as the company cares about total consumption over a period of time. In these terms, prediction within 2.4% is pretty good. train Root Mean Squared Error: 562 MW, Mean Absolute Error: 354 MW, Mean Absolute Percent Error: 2.2% Mean Error: 15.0 MW, Mean Percent Error: -0.2% val Root Mean Squared Error: 1633 MW, Mean Absolute Error: 1174 MW, Mean Absolute Percent Error: 8.0% Mean Error: -423.0 MW, Mean Percent Error: -3.8%
The model still shows overfitting with train RMSE of 425 MW being << 1819 MW, so further optimization would be useful. The best performing model is logged in mlflow as: aba1c2cb5d0f4ab6876236520dcf2706/best_estimator
It has the following parameters: best_learning_rate 0.01 best_max_depth 20 best_min_child_weight 7 best_n_estimators 300 best_subsample 1 scoring neg_root_mean_squared_error
The initial model had trouble predicting extremes. projects\MachineLearning\energy_use_time_series_forecasting\time_series_forecasting_energy_use.py![image](https://github.com/bdzyubak/torch-control/assets/37943739/8b0c3215-86d6-4ef0-8191-689a23e9a481)
1) Improve prediction at extremes. 2) Add other metrics for better human readability. No specific RMSE target exists at this time, so just report results.