California-Data-Collaborative / CA-Stormwater-Data-Challenge

Visualize potential dry-weather runoff contributing areas to identify prioritization for areas to target
15 stars 11 forks source link

Using Inefficiency Data #6

Closed amandaaprahamian closed 7 years ago

amandaaprahamian commented 7 years ago

Hi all!

Thank you to @monobina for putting together the water inefficiency data. Now the fun part: finding meaning for the public within the pile of numbers. Let's break it down and find how we can give a sentence of data instead of csv's of data to the public:

In the TotalInefficiency.csv:

geoid10

Too High Numbers:

Picturing 9,657 gallons of water per month is a little difficult though. So maybe we only use the above data for highest and lowest usage months and we use the WeightedAverageEfficiency to get efficiency in a proportion instead of in gallons:

In the WeightedAverageEfficiency.csv:

End Product:

With both these csvs, I see us getting an end product like this: "Your neighborhood typically uses 97% of its water budget. The biggest usages in 2015-2016 water year were in August and January. The lowest usages were in September and February."

@datwater @monobina Please correct me if I'm pulling incorrect meaning from the data. What are some other sentences we could pull from this data too/instead?

Thank you!

amandaaprahamian commented 7 years ago

Using only Tier 3, 4, and 5.

(if we’re just using the TotalInefficiency.csv for highest and lowest usages, this comment can be ignored)

@monobina explained that there are 5 “Tiers”. Each individual is assigned to a Tier based on their water inefficiency (water use/budget). For example, if you’re < 70% (not inefficient) you’re in Teir 1 for that month. If you’re < 100% (not inefficient), you’re in Teir 2. If you’re between 100 and 115% (inefficient because you used more than in your budget) you’re in Tier 3, and so on.

In order to calculate the data for inefficient use (use that was above the budget), @monobina only used Teir 3-5 individuals in her calculations. Sometimes there was no Tier 3-5 individuals in a month because everyone used < 100% of their budget (in the original issue above, geoid 60590320022003 doesn’t have a 7th month in 2015 for this reason).

If we use this data to average and find “Last water year (July 2015-2016), your neighborhood used an average of 7,345 gallons of water above what it needed each month” do we need to add back in the Tiers 1 & 2 to get a correct net inefficiency?

patwater commented 7 years ago

No (unless I am misunderstanding). Tier 1 reflects a customer's efficient indoor needs and tier 2 reflects a customer's efficient outdoor needs so usage above that (Tiers 3-5) is inefficient. See here for a nice graphic: https://www.mnwd.com/understandingwaterbudget/

amandaaprahamian commented 7 years ago

So it looks like Tier 1 and Tier 2 are customers that remained within their water budgets for the month. Tier 3, 4, and 5 are customers that exceeded their water budgets for the month.

How, if at all, do we want to calculate "Last water year (July 2015-2016), your neighborhood used an average of X,XXX gallons of water above what it needed each month"?

geoid10b

For example, in geoid 60590320022003, all customers remained within their water budgets in 07/2015. I made up a number that said the whole census block was "-10" ccf under their budget for that month (grayed row). Averaging the inefficient use column with and without this Tier 1&2 data gives different outcomes:

Which would we rather do?

Including Tier 1&2 customer data would 1) give us data for missing months and 2) lower overall inefficient use numbers because those numbers would incorporate below-budget individuals too

Again, we may not want to include this metric and only include the weighted proportion of water use:budget. @monobina does the WeightedAverageEfficiency.csv incorporate data from all 5 Tiers? Or just 3-5?

monobina commented 7 years ago

@amandaaprahamian thanks for opening this issue and great points! My thinking is for the total value for usage over budget, it might be misleading to include Tier1 and Tier2 in there as those are still efficient and within budget usage. I can see that the total absolute value might be too large if the customers see it and why @amandaaprahamian you recommend using an average. If I take an average of inefficient usage in a month I would still take average of Tier3, 4 and 5 usage to say on average so much ccf or gallons usage happens over budget. It might be still useful to keep the total value (sum of tier3,4 and 5 by census block) as it might more reflective of the potential of water run-off. Open to more thoughts on this from the rest of the team @datwater @christophertull @patwater

monobina commented 7 years ago

@amandaaprahamian i like the end product description for the users. Maybe also add the total/average usage exceeding budget was xgallons in the past year or so?

patwater commented 7 years ago

@monobina I agree and think that the average amount over budget could be more relatable as 'neighbors in your community use 10,000 or whatever gallons per month more than is efficient' is more relatable than acre feet or a bazillion gallons. Also might be worth to translate that amount over budget into how many hours of outdoor watering or something tangible

datwater commented 7 years ago

I agree- we can take the total water used in Tiers 3, 4, and 5 and divide by the number of households (water meters) and provide the average by month of inefficient usage. I like the average efficiency too but I'm a bit of a data geek so it might not be the best data to show to the public. Might be a good tool for OC Stormwater though for the utility portal.

monobina commented 7 years ago

Thanks @datwater for your thoughts on this. I think this might be a good approach for calculating monthly average inefficient usage metric. Does everybody else agree with this?

patwater commented 7 years ago

Yup I agree! Would just also add that it'd be nice to have a statement like "X gallons of average amount of inefficiency or y minutes of taking a shower each month"

amandaaprahamian commented 7 years ago

@patwater I like that idea. A google search says that showers use around 2.5-3.0 gpm. With the example numbers from above that means 9,657 gallons of water is "between 3219.0 and 3862.8 minutes of taking a shower each month". We could round it to "about 3500 minutes", but that's still a pretty high number. Changing it to hours ("about 60 hours") or days ("about 2.5 days") might be easier to digest, numbers wise. What do you think?

patwater commented 7 years ago

Looks good to me! Would be curious to the MNWD folks thoughts to make sure that aligns with their on the ground intuition. @monobina ?

amandaaprahamian commented 7 years ago

Here's what I gather from above. The sentences regarding water overuse will be:

"Your neighborhood typically uses 97% of its water budget. Last water year (July 2015-2016), your neighborhood used an average of 9,657 gallons of water per month above what it needed. That’s equivalent to about 3.2 days of taking a shower! The biggest usages were in August and December and the lowest usages were in September and February."

I disagree with the approach discussed above for calculating the bolded calculations in those sentences. I believe we should include all Tiers, which would be usage data from all customers regardless of whether they were within a month's budget or not. Here’s my reasoning:

Because I want an average on a yearly scale, we should use a yearly budget and a yearly usage that doesn’t discriminate based on whether the usage was within a monthly budget.

Walking through my reasoning, I think the sentence should be tweaked to explain exactly what we calculated. I'm thinking that switching it from “gallons of water above what is needed each month” to “gallons of water per month above what is needed” would work. I included this change in the paragraph in this comment.

In a forum format it's a bit difficult to discuss which calculations to do and why, so if we set up a meeting I might better understand nuances in the data and our analysis. Please let me know your thoughts. I am available next week for a meeting.

monobina commented 7 years ago

@amandaaprahamian yeah lets talk offline on this. I have sent you an email to set up a time for a call next week. Anybody else who wants to join the call are most welcome! @johnathancruz and I will be on the call from MNWD

patwater commented 7 years ago

@monobina @amandaaprahamian @leighphan any updates or anything you need on our end?

monobina commented 7 years ago

@patwater we have shared the data with @amandaaprahamian recently. It should have all the requested variables. @amandaaprahamian let us know if you need anything else.

amandaaprahamian commented 7 years ago

@patwater @monobina @leighphan The data has been compiled! Next step is working on how we want to present it. Please check out this box.com document and add edits/comments.

patwater commented 7 years ago

Looking good! Is there an eta on launching a demo on the web?

monobina commented 7 years ago

The final data for the public facing app is uploaded in the Data folder "IEUsage_Rebate2". "Data dictionary_Stormwater" doc file is added to help interpret the data :-)