California-Data-Collaborative / OWRS-Analysis

Analysis of water rates collected in the OWRS format.
3 stars 6 forks source link

Simulate changes in demand #9

Open christophertull opened 6 years ago

christophertull commented 6 years ago

This one is lower priority, but it would be great to be able to accurately simulate changes in demand, e.g. from water conservation measures and look at the impact on revenue. We should be able to do this using the supplier report.

One approach would be to generate a synthetic population of households with representative water use based on the population and GPCD from the supplier report. Could use some sort of bayesian or monte-carlo method with prior assumptions for the distributions. Then imagine that all customers reduce waster use by 25%... etc

patwater commented 6 years ago

Could you elaborate a bit on what your thinking here? Note the supplier report is ver high level / aggregate though may be able to do some Steve Piper style analysis

christophertull commented 6 years ago

Yeah sooo, basics of what I am imagining is, for each utility, try and reverse-engineer (dis-aggregate) a synthetic population of households that matches the aggregate patterns.

Could set this up as a monte carlo problem similar to below:

per_capita_usage ~ lognormal(GPCD, sigma^2)

Each person in our synthetic pop. has water use that is distributed lognormally with mean GPCD = the gpcd from the supplier report. Variance could be guesstimated from CaDC data or something.

From this point we could simulate person-level water use by randomly sampling from the above distribution N times, where N is the population provided in the supplier report. We could sum this usage and compare to aggregate production in supplier report, keeping in mind discrepancy between end use and production. Just getting within ~10% should be good enough.

This gives us person-level simulated use, but for rate modeling we need household-level simulated use. Can approximate this by estimating household size:

hhsize ~ Poisson(MEAN_SIZE)

So, for each household in our synthetic population we assume the hhsize is Poisson distributed with mean MEAN_SIZE. Could image pulling mean MEAN_SIZE from census or just arbitrarily assuming 3.

Now we can go through and randomly generate hhsizes and assign simulated people to these households until all people have been assigned to a household. This gives us a synthetic population of households where the DISTRIBUTION of simulated HH-level water use will hopefully match the distribution of actual HH-level water use.

Why is this complicated setup needed to analyze the impact of demand changes on revenue?

  1. Any sort of tiered rates respond non-linearly to increasing water use. Non-linearity means that assigning mean usage to each household does not result in accurate aggregate revenue forecasts. We need to capture the distribution of HH-level use.

  2. This distribution will obviously be different in different utilities, e.g. bay area vs. inland empire. Incorporating GPCD information into the simulation framework allows us to hopefully capture these differences.

disclaimer

This sort of approach would work best for standard IBRs which are about 70% of rates we have so far. This approach is overkill for uniform rates because price is linear in usage. Budget-based rates require a more sophisticated approach because of need to capture correlations between ET/LA/hhhsize and usage (I also have ideas but they are not as fleshed out. See here for some idea on how one could proceed).

patwater commented 6 years ago

Re 2 "distribution will obviously be different in different utilities" though the GPCD only captures the mean and not the difference in shape or spread of the distribution. I suppose we could use SCUBA to inform that though as a blend between private and public data sources

christophertull commented 6 years ago

Yeah exactly. this sort of approach would mainly be useful for simulating changes in demand and effect on revenue.

In theory one could expand on this approach by making the simulated households spatially explicit and capturing inter-dependencies between income, hhsize, water use, ET, LA, etc. (e.g. using something like a bayesian network trained on SCUBA data). This would expand the utility of the model a lot and allow it to tie in with related efforts like UCLA ET/climate modeling.