CSTARS / eto-zone-maps

California ETo Zone Maps
MIT License
1 stars 1 forks source link

Describe Station averages #10

Open qjhart opened 6 years ago

qjhart commented 6 years ago

There are three sets of ETo estimations at each station that we compare for various reasons. These are daily station estimates, longterm estimates, and raster-based estimates. We use the following prefixes for these:

s_eto = CIMIS reported station ETo. These are the station data as reported contemporaneously with the raster data calculations. That means we have estimate for these every day for each day we have spatial CIMIS estimations. These are stored in the station table. There are about 500K entries in this table, for each day and each station.

r_eto = Spatial CIMIS calcuated ETo. These are the ETo estimations that are calculated each day, from the combination of the station data and the GOES estimated Rs estimations. We have these for every day spatial CIMIS was calculated as well. These are stored in the raster table. Thee are about 1M entries in this table, more than the station table, since we get the estimation regardless if the station reported a result for that particular day or now.

lt_eto = Long term average ETo. These are long term averages as supplied by the CIMIS program. We have a daily long term estimate, but these are independent of year, since they are averages. These are kept in the cimis_15day table. There are 134 stations, and 366 days per station in that table.

In addition, we often use 15 day window running averages of these data. We do this especially when summarizing our yearly data into 52 weekly average values. These values are what are used for calculating errors, and differences. They are also what are used as inputs to the FFT transfomations to further summarized our yearly ETo estimations into 5 FFT parameters, Three powers and two phase components. We add a 15 to the prefix designations. s15_eto , r15_eto, lt15_eto.

Station - Raster Comparisons

Long Term Average Comparisons

DWR has supplied some long term averages for a about 122 stations. Our interest is to compare these data with the Spatial CIMIS raster long term averages. The raster long term average data exists in the table, fft.raster_15avg_ed. There is one for every pixel, So we just need to extract the station pixels. We created a table for the station's associated pid from compare.station_xy that combines the station_info w/ the cimis boundaries, so we can just use that.

Station Location differences

Note, however, the lt_* data reports some stations considerably far from the station_info data as reported by the et.water.ca.gov website. We are assuming the station info is correct, but these are the stations more then 500m from as reported be et.water.

station_id longitude latitude diff
135 -114.666 33.557 15431
196 -122.144 38.685 11337
88 -119.605 34.932 6388
84 -121.311 39.271 2088
152 -118.994 34.232 1407
114 -121.29 36.359 1305
170 -122.02 38.004 1264
194 -120.851 37.719 911
136 -116.154 33.516 868
175 -114.726 33.389 863
74 -116.973 33.09 758
56 -120.761 37.093 752
79 -122.421 38.549 698
62 -117.222 33.49 691
77 -122.41 38.434 614
90 -120.479 41.433 589
200 -116.258 33.746 553

We can then calculate the ratio of lt_p0/r_p0 to compare the DWR long term averages.

Contemporaneous Station Data Comparisons

When we are looking for biases in the station vs. raster estimations, we look at these data. One important table we have is the compare.ymd15 table. This compares the ETo estimations for s_eto and for r_eto for every 15day time window in Spatial CIMIS history. So, for each 15day time window, we calculate the average station and raster eto for that window. You can think of this as a 15x reduction in the data to compare, by only looking at those average values. There are about 69K entries in this table, covering overlap in each station, and each 15 day window, so each entry is an average of 15 days, or sometimes less. There is a range of overlapping windows based on these comparisons. The Station-Raster Dates and Count Google Sheet, shows the starting and stopping dates for the comparisons, and how many window entries overlap.

Now, we can take Just the overlapping time windows from this ymd15 table, and we can calculate our FFT transform parameters from that. So, note, for each raster location we are are calculating special FFT parameters, specific to the overlapping time windows with the stations. That way when we calculate a ratio, the ratio are comparing estimates from the same time period.

Combined Ratio Comparisons

The Long Term / Station / Raster Ratios Tab in the Google Sheet, shows a summary of the long_term and station ratios. Note there are two estimates from the raster data, the r_p0 is the long term data, and the s_r_p0 is the raster values from the data that overlap the station information. The too ratios then are s_p0_ratio = s_p0/s_r_p0 and lt_p0_ratio = lt_p0/r_p0. The ratios are fairly similar, but there are some differences. In that sheet, the column station_overlap_yrs shows the length of the comparison overlap. It's been suggested that for the station ratios to only look at stations with an overlap of 5 years or more.

If you were interested in seeing the largest differences, you could compared these two ways. You could look at the biggest differences in the p0 ratios, by looking at | (s_p0 / s_r_p0 ) -1 | where the absolute value orders by big differences in the ratio. If we are looking for a station to raster conversion, this ratio can be used. Or you could just look at the absolute value of the difference of s_r_p0 and r_p0, `| s_p0 - s_r_p0 |'. Here the values are equivalent to the average daily difference in ETo.

We plan to create a single multiplier for p0, we will look at the ration. The tab Rapid Change in s_p0/r_p0 In the Google Sheet, shows the stations that have the most rapid change in s_p0/r_p0 ratio in the images. Higher numbers mean more rapid changes from one station to another.

Ratio Splines.

This ratios are then used as input to a 3-d spline parameterization, Using Grass' v.vol.rst An example invocation looks like

 p0=${r}_s${s}_z${z}_t${t}_p0;  
 v.vol.rst --overwrite input=ratio wcolumn=${r}_p0_ratio \
 cross_input=Z@2km maskmap=state@2km \
 tension=${t} zscale=${z} smooth=${s}  cross_output=${p0} \
 where="${r}_p0_ratio is not null and station_overlap_yrs > 4";

A result of running a set of these splines is shown in the Splines Cloud directory.

The three parameters that are modified are

Some of the parameters used result in an overshoot, that is the spline cannot be made to fit the data without extrapolating beyond the bounds of the input data. This is an indication that the spline is probably not too reliable.

You can see the data are pretty similar between the lt_ and 's_` values. for s=0 you need to increase tension to 7 before you remove overshoot, the result is a ratio that is probably a bit to blotchy. For s=0.02, you do get some overshoot at t=3, but the results are move believable.

Big Drivers for the Spline

Note the may be some indication of systematic changes west of the central valley, but they are not super clear. Note the LA stations show the biggest bend, but there are large bends up the west coast, and in the NE Ca (one station) as well.

qjhart commented 6 years ago

Ricardo suggested that we not include stations with comparison scales less than 5 years. That would eliminate about 38 stations from the 159 stations we have. That seems like a pretty good idea, as these staions with little overlap can have large errors.