WUDR Working Session - March 15 2022

julieshortridge commented 2 years ago

@laljeet @jdkleiner @rburghol

Notes from working session (3/15)

One important thing that has become apparent in our analysis so far of this project is that the USDA Census and Irrigation Survey data should be considered a partial record of irrigation that overlaps with VWUDS reports (which is also a partial record). In other words, both datasets are going to have some farms not reporting to them. This is important because our initial conceptualization of the USDA data in the proposal was that it would fully encompass the VDEQ data (in other words, a superset of VWUDS reports). This difference only became apparent after doing a detailed comparison of county-level USDA and VWUDS irrigation estimates. While this will not impact the final products developed, it has required some additional analysis and refinements of our coefficient estimation methods to account for this change.
Method 2 and 3 both aim to do the same thing: estimate the total volume of water (reported + unreported) based on multiplying USDA irrigated acres data with some estimation of irrigated depth. The main difference is how that irrigation depth is calculated. Method 2 simply applies the state average depth everywhere, whereas Method 3 calculates a county-specific depth based on the water demand of crops grown under irrigation in that county and rainfall received.
The main advantage of Method 3 is that it can account for differences in crop water demands (e.g., a county growing "thirsty" crops will use more water) and rainfall (a county that receives more rain will irrigate less). The main disadvantage is that crop specific irrigated acres data is not available for all counties. It varies from year-to-year, but generally only 30%-50% of counties with USDA irrigated acres have that data broken down into specific crops. Even in counties where this data is available, it doesn't cover all irrigated acres - ie, if a county has 3000 acres of irrigation, only 2000 of those acres are associated with specific crops, and the remaining 1000 don't specify what's being irrigated. Method 3 requires making a lot of assumptions about what crops are grown, what their water demand (ET) is, and whether growers are irrigating precisely to those demands.
A proposed path forward would be to use the crop water demand estimates (from literature), along with county-level rainfall data, to refine the estimates of county-level irrigation depth used in Method 2. The advantage of this is that it would account for year-to-year variability in a consistent way across the whole state. We could use multiple crop water demand estimates (e.g., a high, low, and median value) to account for uncertainty in what crops specifically are being grown.
Task 4 from the proposal (meteorological analysis) is an application of the unreported withdrawal estimates. The goal of Task 4 is to quantify the relationship between total (reported + unreported) withdrawals and weather characteristics. One part of this task would be a multivariate regression. We discussed my concern that in that regression, rainfall would be one of our predictor variables, but is also part of the calculation used to determine our response variable (total withdrawals). As I've thought about this more, I think this is OK, but we need to be clear about the goals of the multivariate regression. It is not to determine the statistical significance of predictor variables - in this case, we fully expect rainfall to be a significant predictor because it was used in the calculation of total withdrawals. Rather, we are using that regression as a way of simplifying the complex relationship between rainfall and total withdrawals into a simple form (i.e., for every 1 inch reduction in growing season rainfall, withdrawals increase X%). We can then use this simple form to generate different "scenarios" of withdrawal under different meteorological conditions. So I think we can still proceed with analyses as described in the proposal, we just need to be clear about what they are and are not telling us.

Steps to Wrap-Up Project

Objective 1 (Tasks 1 & 2)

Objective 1 (Coefficient Estimation) - Develop a set of coefficients to estimate unreported agricultural water withdrawals at the county level based on the irrigation data from the USDA Agricultural Census, the USDA Irrigation and Water Management Survey (IWMS), and literature-based estimates of crop water requirements.

Definitions:

W_unr is the estimate irrigation withdrawal not reported to VWUDS.

W_{unr_sm} is the estimate irrigation withdrawal not reported to VWUDS from small farms (below reporting threshold)

W_{unr_lg} is the estimate irrigation withdrawal not reported to VWUDS from large farms (above reporting threshold)

W_rep is the reported withdrawal (VWUDS)

W_fac is the withdrawal for known VWUDS facilities (by county)

Outcome:

W_unr[t] = f(W_rep)

This was what we had initially proposed but the problem is that there are many counties where W_rep = 0 but W_unr >0

Question for VDEQ: Is it preferable to have a single approach for all counties (those with and without W_rep) or to have one approach in counties with W_rep and an alternative approach for counties with W_rep = 0

@laljeet Could you sum up the number of counties and total unreported withdrawal (using methods 1 and 2 for now) in the counties for which we have VDEQ reported withdrawal W_rep and those counties where we don't?

May take the form of 2 equations per county:

Below Reporting Threshold: W_{unr_sm}[t] = f(W_rep[t])

Missing Reporters: W_{unr_lg}[t] = f(W_rep[t])

Could be as simple as:

When W_usda > W_rep (1.0 + stderr), W_{unr_lg}[t] = c W_rep

else, W_{unr_lg}[t] = 0.0

W_unr[t] = W_{unr_lg}[t] + W_{unr_sm}[t]

_Current formula is W_unr = Irr_Area x Irr_Depth - W_rep , where Irr_Area comes from USDA data and IrrDepth can be calculated different ways.

Questions:

Can we have a single coefficient per county? i.e.: W_unr[t] = c * W_rep[t]; where c = decimal proportion

_Yes, this is how we've set up the code already, for all methods. However, with county-specific W_{unrep_sm} and W_unreplg volume, rather than coefficients, due to the missing W_rep counties mentioned above.

Or, do we need a time-varying form: W_unr[t] = c[t] * W_rep[t]

4/11/2022 proposal from @julieshortridge and @laljeet was for a time-varying form that interpolated between coincident VWUDS/USDA values

@rburghol : the time varying form will be more difficult to use (how do we use it in scenario mode where we forecast, a la Task 4?), and also places too much emphasis on individual points in time. Other potentially more powerful could be:

What is the mean coefficient over time?

What is the max coefficient over time? (this insures we don't underestimate the un-reported)

I think it would be helpful for us to talk through how exactly the forecast simulations work. We've been conceptualizing this output as being in a similar format to the VWUDS data that already exists (time series of monthly withdrawals) that could be added in to a simulation or left out. However, if that's not the best format we can definitely adjust.

I think the most straightforward way is to develop coefficients that are then applied to the monthly withdrawals of the reported users. The reported users, when in scenario mode, simply multiple their annual total by the monthly coefficient

However, if there is there a trend in W_{usda_sm} / W_rep, time-varying could be important to know.

A trend analysis was included in our proposal so this is something we were already planning to do. Just need to decide what we're doing the trend analysis on (county vs state, time series versus 5-year estimates, etc.)

But, given only 4 data points per county, this is going to be hard to make any sense of on a county level. Maybe climate zones (which are groups of counties) or just overall state-wide trend could be useful to know?

Assumptions:

[x] VWUDS is minimum total water per county.

It is reported, not estimated, so total water must be at least the VWUDS total

[x] USDA is minimum water by county only when greater than VWUDS by a margin greater than the standard error reported by USDA.

[x] USDA should be evaluated by county, not state (is possible) It is the county un-reported, not state?

[x] Can we assume that when USDA has more water in a county than VWUDS that it means we have additional unreported water? (beyond the small farms that we know we aren't catching) -- in other words, there is no zip-code flim-flam that is causing geographical errors

[x] Can we assume that when USDA county has less than VWUDS that it is a function of non-respondents to USDA? (not zip-code flim-flam)

[x] Irrigators that are smaller than VWUDS reporting are captured in Ag Census

[x] WSP small irrigators are based on Ag Census, so we can favor this data over WSP when >.

Small Farms (below reporting threshold)

[x] Are a percentage of reported withdrawals when reported > 0.0

[x] We will use ag census when reported = 0.0

[ ] Should the % value be varying every 5 years or should it be based on long term trend?

Action items:

[ ] Annotate current proposed formula for W_unr[t] = f(P, T, W_rep)

Objective 2 (Task 3)

Objective 2 (Time-series Generation) - Combine the coefficients with reported irrigation withdrawals to generate a time series of monthly total irrigation withdrawals (reported plus nonreported) for major agricultural counties in Virginia.

Since we have a known reported demand for the entire timeseries, this involves selecting the best method (or combination of methods) from Objective 1 with reported demands, to estimate an unreported demand for historical periods.

Outcome:

Time Series could be a CSV,

or a set of formulas that we plug into model for each county.

R code to reproduce formulas

Objective 3 (Task 4)

Objective 3 (Meteorological Analysis ) - Use the coefficients and reported irrigation withdrawals to estimate a range of total irrigation withdrawals under different weather scenarios (e.g. average year conditions, moderate drought conditions, and extreme drought conditions).

Outcome:
- W_unr[t] = f(P, T, W_rep)
- W_fac[t] = f(P, T, W_rep)
- Where:
- The best of breed function for unknown may not be dependent on reported withdrawals, so that factor could have a coefficient of 0.0, however, for future-casting (Task 4 main application I think), it would have to be a function of W_{. rep} (and is specified as such in Task 2), so, this is not hard.
Goals: Insure that we develop a sound model:
- @julieshortridge: expressed concern (see above) "... concern that in that regression, rainfall would be one of our predictor variables, but is also part of the calculation used to determine our response variable ... So I think we can still proceed with analyses as described in the proposal, we just need to be clear about what they are and are not telling us."
- @rburghol: I think that the way that objective 3 is described in the intro makes clear that the MET analysis should involve both USDA and VWUDS data, however, the description in Task 4 is vague with respect to the role of VWUDS data in the final analysis. My feeling is that VWUDS analysis can be the strongest part of this analysis, and we can avoid JS's concern by developing a regression of meteorology and VWUDS data, thus our timeseries can be as follows:
  - In essence, I think that what we are trying to answer is "how much of the variability in reported demand can be explained by rainfall deficit?", and thus, while the methods used are informed by previous tasks, we can select the data used such that there is no methodological issue.
  - @rburghol proposed path forward: We use met to predict variation in un-reported demands, and we use met to predict the ways in which reported demands will vary due to met.
- I think that we leverage the previous task work, and use met to predict variations in reported demand, then we use our coefficients to predict the un-reported, so the formula looks more like:
  - W_scenario[t] = f(P, T, W_rep)
  - and W_unr[t] = f(W_scenario)
  - Where W_scenario is the future demand and/or climate change scenario withdrawal.

julieshortridge commented 2 years ago

Also here is the text from our proposal about the time series - I think we have some flexibility in terms of the methods we use to generate the time series so I think the proposed path forward is consistent with what's in the proposal:

"The results of Tasks 2 and 3 will be evaluated to determine a final set of time series for incorporation into the VAHydro data system. Depending on the results, this may include a single time series assumed to the best estimate of unreported withdrawals, or multiple time series that can represent a reasonable range of possible values."

laljeet commented 2 years ago

@rburghol @jdkleiner @julieshortridge Below are the summary tables for method 1, and method 2 for the counties with DEQ reported and non-reported data. Method 2 unreported amounts take into account method1 unreported.

Method 1

2007

| Counties with VDEQ withdrawals | Counties without DEQ withdrawals -- | -- | -- Number of counties | 45 | 46 USDA Irrigated Acres | 68443 | 10526 Unreported Irrigated vol (mgd) | 637 | 369

2012

| Counties with VDEQ withdrawals | Counties without DEQ withdrawals -- | -- | -- Number of counties | 45 | 50 USDA Irrigated Acres | 53664 | 10933 Unreported Irrigated vol (mgd) | 750 | 352

Method 2

2007

| Counties with VDEQ withdrawals | Counties without DEQ withdrawals -- | -- | -- Number of counties | 45 | 44 USDA Irrigated Acres | 68428 | 10497 Unreported Irrigated vol (mgd) | 7270 | 1773

2012

| Counties with VDEQ withdrawals | Counties without DEQ withdrawals -- | -- | -- Number of counties | 43 | 50 USDA Irrigated Acres | 53642 | 10935 Unreported Irrigated vol (mgd) | 3487 | 1542

rburghol commented 2 years ago

Thanks for the above Lal. This is interesting since it is the first time I think I've seen unreported volume increase over time (method 2, 2012), but question: why is 2002 and 2017 not included in this?

julieshortridge commented 2 years ago

Hi Rob, I just had Lal put those tables together in response to your point about representing unreported withdrawal as a percentage coefficient of reported, and how that only works in the counties where we have VDEQ withdrawals. For our discussion, I wanted to get rough sense of how many counties have USDA irrigation data but no VDEQ withdrawals, and how much irrigation is present in those counties. So the key thing with these isn't so much the changes through years, but more that if we represent unreported withdrawal as a percentage of reported, we miss out on the counties on the right hand side of the table.

rburghol commented 2 years ago

schedule meetings going forward
spreadsheet task 1&2:
- Small unreported volume by county
- large unreported mean value by county
- small coefficient vs. reported irr use
- small coefficient vs. reported total use
- large coefficient vs. reported irr use
- is there a trend over time?

rburghol commented 2 years ago

The final task of the project will be to use the time series generated in Task 3 to evaluate how unreported and total irrigation withdrawals will vary under different weather conditions. One of the key ways that irrigation withdrawals can stress water supply is related to their timing. Because irrigation needs are highest in hot, dry weather, irrigation withdrawals are likely to be greatest at times when surface water supplies will be lowest. Because the USDA census data is only collected every five years, it is unable to capture year-to-year variations in irrigation withdrawals needed to characterize this climatic sensitivity. For instance, total summer rainfall in 2017 in Augusta and Caroline counties was average (approximately 660mm), but higher than average in Accomack County (Figure 1). Because of this, irrigation withdrawals in Accomack county in 2017 may be somewhat lower than the long-term average. Further, this data will not reflect withdrawals during dry years (such as 2010) when irrigation usage might be highest and have the greatest impact on water supply.

Figure 1: Total summer (June – August) rainfall in three Virginia counties with high levels of irrigation. To address these limitations, this step of the project will evaluate the climatic sensitivity of total irrigation withdrawals to better characterize water use during dry periods. For each county included in the analysis, the PRISM climate data will be used to identify the driest year on record for that county and estimate and map “dry year” total withdrawals across study counties. These results will be compared to estimated total withdrawals in an “average” year to provide a sense of the climatic sensitivity of irrigation water use. However, irrigation withdrawals are not only sensitive to total growing season rainfall; for instance, high temperatures and extended dry periods will also increase crop water needs but may not be reflected in total growing season rainfall (Paoletti and Shortridge, 2020). To account for this, total growing-season withdrawals will be regressed against multiple weather characteristics obtained from the PRISM climate data, including total rainfall, average temperature, and dry-period length. These regressions will be used to estimate withdrawals under more extreme weather scenarios than were experienced between 2002 and 2017. Collectively, these results will be used to create a set of total withdrawal scenarios that account for unreported withdrawal under different drought conditions (e.g. “average year,” “moderate drought,” and “drought of record”). These scenarios can then be available for drought simulation modeling to better characterize how unreported agricultural withdrawals may impact low-flow metrics important for water supply planning. This work will be completed by a Virginia Tech graduate student, and reviewed by Dr. Shortridge, Dr. Scott, Mr. Green, and Mr. Burgholzer.

HARPgroup / WUDR