Open timchatterton opened 5 years ago
These were the r-squared - 0.6-0.7 for gas and electric - MUCH MUCH lower on the cars (this was structural factors not social ones - but will discuss with Sally and Jillian where they got to) Electricity | R2 | |
---|---|---|
Rooms | 0.4316 | |
% Gas Heating | 0.6644 | |
%Elec Heating | 0.6928 | |
%Flats | 0.7029 | |
Built 1930s | 0.7064 | |
Gas | ||
Rooms | 0.5146 | |
Flats | 0.5721 | |
1930-39 | 0.5948 | |
1900-18 | 0.6064 | |
% Gas Heating | 0.6167 | |
Car (All HH) | ||
Pop Density | 0.06699 | |
Cars per HH | 0.1155 | |
%HH without cars | 0.118 | |
PT time to Town Centre | 0.1179 | |
% Active to Work | 0.1191 | |
Cars (HH with Cars) | ||
Pop Density | 0.02138 | |
%HH without cars | 0.02189 | |
% Cycle to Work | 0.02194 | |
Distance to Work | 0.0226 | |
Cars per HH (with Cars) | 0.02932 |
Interesting stuff @timchatterton, many thanks for sharing these results. More discussion to follow no doubt.
@timchatterton Looking at the code https://github.com/creds2/Excess-Data-Exploration/blob/master/Tim/RScripts/Modelling/Modelling%20main%20factors.R your car models are predicting gas consumption!
I've committed a small fix, but I can't reproduce your plots as the code is missing
Also, can you explain how you chose your variables? E.g. why %cycling rather than %driving to work?
Hi - I clearly hadn't save the right version of the script to github - the gas issue was spotted quite quickly and sorted out - and the plots were added to the bottom of the code - I believe this versionis now updated.
THe variabls were taken from the top 5 most important (structural) according to the XGBoosts
Hi @timchatterton, I was hoping to talk to you at the meeting, but I was off sick. Fortunatly I'm much better now. I wanted to draw your attention to some experiments in modelling at https://github.com/creds2/Excess-Data-Exploration/blob/master/Modeling_Summary.md I was able to get much better results for the driving, and comparable results for Gas and Electric I used an approach of taking the single most important variable, then finding what correlated with the residuals, and replete.
It gives me a slightly different selection of variables. But you can see the "logic" is similar in both your and my results. The driving result is very strongly correlated (r squared of 0.85) but I'm getting some s-curved results which suggest I'm not correctly handling the non-linearity correctly, any suggestions?
OK - I have got the plots that I sort of wanted! Basically for this version - I have created a linear model for the top 5 structural factos for gas, electricity, car energy (average over all huseholds) and catr energy (Only HH with cars)
I have plotted up modelled vs measured - and added a 1:1 line - anything to righ of line could be excess? I have also colored in red where measured is >25% above modelled.
Basically you get a good linear modelling for gas and electric - it doesn't work so well for the MOT data!
Any thoughts welcome.