Open qjhart opened 6 years ago
Ricardo suggested that we not include stations with comparison scales less than 5 years. That would eliminate about 38 stations from the 159 stations we have. That seems like a pretty good idea, as these staions with little overlap can have large errors.
There are three sets of ETo estimations at each station that we compare for various reasons. These are daily station estimates, longterm estimates, and raster-based estimates. We use the following prefixes for these:
s_eto
= CIMIS reported station ETo. These are the station data as reported contemporaneously with the raster data calculations. That means we have estimate for these every day for each day we have spatial CIMIS estimations. These are stored in thestation
table. There are about 500K entries in this table, for each day and each station.r_eto
= Spatial CIMIS calcuated ETo. These are the ETo estimations that are calculated each day, from the combination of the station data and the GOES estimated Rs estimations. We have these for every day spatial CIMIS was calculated as well. These are stored in theraster
table. Thee are about 1M entries in this table, more than the station table, since we get the estimation regardless if the station reported a result for that particular day or now.lt_eto
= Long term average ETo. These are long term averages as supplied by the CIMIS program. We have a daily long term estimate, but these are independent of year, since they are averages. These are kept in thecimis_15day
table. There are 134 stations, and 366 days per station in that table.In addition, we often use 15 day window running averages of these data. We do this especially when summarizing our yearly data into 52 weekly average values. These values are what are used for calculating errors, and differences. They are also what are used as inputs to the FFT transfomations to further summarized our yearly ETo estimations into 5 FFT parameters, Three powers and two phase components. We add a
15
to the prefix designations.s15_eto
,r15_eto
,lt15_eto
.Station - Raster Comparisons
Long Term Average Comparisons
DWR has supplied some long term averages for a about 122 stations. Our interest is to compare these data with the Spatial CIMIS raster long term averages. The raster long term average data exists in the table,
fft.raster_15avg_ed
. There is one for every pixel, So we just need to extract the station pixels. We created a table for the station's associated pid fromcompare.station_xy
that combines the station_info w/ the cimis boundaries, so we can just use that.Station Location differences
Note, however, the lt_* data reports some stations considerably far from the station_info data as reported by the et.water.ca.gov website. We are assuming the station info is correct, but these are the stations more then 500m from as reported be et.water.
We can then calculate the ratio of lt_p0/r_p0 to compare the DWR long term averages.
Contemporaneous Station Data Comparisons
When we are looking for biases in the station vs. raster estimations, we look at these data. One important table we have is the
compare.ymd15
table. This compares the ETo estimations fors_eto
and forr_eto
for every 15day time window in Spatial CIMIS history. So, for each 15day time window, we calculate the average station and raster eto for that window. You can think of this as a 15x reduction in the data to compare, by only looking at those average values. There are about 69K entries in this table, covering overlap in each station, and each 15 day window, so each entry is an average of 15 days, or sometimes less. There is a range of overlapping windows based on these comparisons. The Station-Raster Dates and Count Google Sheet, shows the starting and stopping dates for the comparisons, and how many window entries overlap.Now, we can take Just the overlapping time windows from this
ymd15
table, and we can calculate our FFT transform parameters from that. So, note, for each raster location we are are calculating special FFT parameters, specific to the overlapping time windows with the stations. That way when we calculate a ratio, the ratio are comparing estimates from the same time period.Combined Ratio Comparisons
The Long Term / Station / Raster Ratios Tab in the Google Sheet, shows a summary of the long_term and station ratios. Note there are two estimates from the raster data, the
r_p0
is the long term data, and thes_r_p0
is the raster values from the data that overlap the station information. The too ratios then ares_p0_ratio = s_p0/s_r_p0
andlt_p0_ratio = lt_p0/r_p0
. The ratios are fairly similar, but there are some differences. In that sheet, the columnstation_overlap_yrs
shows the length of the comparison overlap. It's been suggested that for the station ratios to only look at stations with an overlap of 5 years or more.If you were interested in seeing the largest differences, you could compared these two ways. You could look at the biggest differences in the p0 ratios, by looking at
| (s_p0 / s_r_p0 ) -1 |
where the absolute value orders by big differences in the ratio. If we are looking for a station to raster conversion, this ratio can be used. Or you could just look at the absolute value of the difference of s_r_p0 and r_p0, `| s_p0 - s_r_p0 |'. Here the values are equivalent to the average daily difference in ETo.We plan to create a single multiplier for p0, we will look at the ration. The tab Rapid Change in s_p0/r_p0 In the Google Sheet, shows the stations that have the most rapid change in
s_p0/r_p0
ratio in the images. Higher numbers mean more rapid changes from one station to another.Ratio Splines.
This ratios are then used as input to a 3-d spline parameterization, Using Grass' v.vol.rst An example invocation looks like
A result of running a set of these splines is shown in the Splines Cloud directory.
The three parameters that are modified are
matching
layer, and the values are kept low.Some of the parameters used result in an overshoot, that is the spline cannot be made to fit the data without extrapolating beyond the bounds of the input data. This is an indication that the spline is probably not too reliable.
You can see the data are pretty similar between the
lt_
and 's_` values. for s=0 you need to increase tension to 7 before you remove overshoot, the result is a ratio that is probably a bit to blotchy. For s=0.02, you do get some overshoot at t=3, but the results are move believable.Big Drivers for the Spline
Note the may be some indication of systematic changes west of the central valley, but they are not super clear. Note the LA stations show the biggest bend, but there are large bends up the west coast, and in the NE Ca (one station) as well.
In LA, the stations driving the spline are station_id=204 with a very high ratio of 1.2, near station_id=133 with a low ratio of 0.9.
In NE CA, its just station_id=57 with a ratio of 1.15
In the West it's more convoluted, but it involves station_id=109 that has a ratio of 1.005, but is surrounded by stations with a higher ratio, and then the pairs, station_id=122,212,140,167 That are high, near, station_id=166,42,70, that are low.