buds-lab / building-prediction-benchmarking

An array of open source ML models applied to long-term hourly energy prediction for institutional buildings
http://www.budslab.org/
MIT License
26 stars 4 forks source link

Use the scikit learn function `sklearn.model_selection.TimeSeriesSplit` #3

Closed cmiller8 closed 5 years ago

cmiller8 commented 5 years ago

Use sklearn.model_selection.TimeSeriesSplit to segment the data into training and testing for cross-validation

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html#sklearn.model_selection.TimeSeriesSplit

this will be in its own notebooks and will be a fork of the previous scenario with 9 months training and 3 months testings

cmiller8 commented 5 years ago

I updated the cross-validation to take the months in the order that they exist in the individual building data set - ie: if the building data starts in November then the months start with 11, 12, 1,...

Let's make the n for the cross-validation be 3 and create a metric output file for each of the three models

The final visualization will have 4 sets of metrics - one for the original test/train scheme (every fourth month) and the three cross validation steps (3/3, 6/3, 9/3)

cmiller8 commented 5 years ago

@talantbekov123 - to be clear, can you set n=3 for the cross-validation and create separate metrics output files for each of those runs?

talantbekov123 commented 5 years ago

@cmiller8 this file(https://github.com/buds-lab/building-prediction-benchmarking/blob/master/cross-validation/generate-metrics-cross-validation-steps.ipynb) will create separate metrics output files for each of those runs and store produced output here (https://github.com/buds-lab/building-prediction-benchmarking/tree/master/cross-validation/results-timeseries)

talantbekov123 commented 5 years ago

@cmiller8 this file(https://github.com/buds-lab/building-prediction-benchmarking/blob/master/cross-validation/generate-metrics-cross-validation-steps.ipynb) will create separate metrics output files for each of those runs and store produced output here (https://github.com/buds-lab/building-prediction-benchmarking/tree/master/cross-validation/results-timeseries)

Should I create a visualization for this? Like you did in visualizations folder

cmiller8 commented 5 years ago

We can work on the visualization later -- focus on getting the annual schedules as an input first.

cmiller8 commented 5 years ago

Implemented and tested in the master notebook