Open jordansread opened 4 years ago
For future reference in easy to copy/paste code form:
pb0_matched_to_observations %>% group_by(site_id) %>% summarize(rmse = sqrt(mean((pred-obs)^2, na.rm=TRUE)), n = length(depth)) %>% arrange(desc(rmse))
# A tibble: 2,377 x 3
site_id rmse n
<chr> <dbl> <int>
1 nhdhr_109986912 17.2 29
2 nhdhr_109989488 15.3 14
3 nhdhr_121207127 14.7 20
4 nhdhr_121650602 13.2 69
5 nhdhr_145608202 12.1 27
6 nhdhr_109984628 11.8 21
7 nhdhr_121650552 11.6 26
8 nhdhr_121650633 11.4 83
9 nhdhr_121207134 11.3 61
10 nhdhr_109987472 11.3 20
11 nhdhr_121627799 11.3 51
12 nhdhr_121628955 11.3 34
13 nhdhr_121650613 11.2 84
14 nhdhr_109990726 10.9 68
15 nhdhr_69545019 10.8 86
16 nhdhr_85083102 10.8 53
17 nhdhr_109989482 10.2 32
18 nhdhr_121650592 10.2 59
19 nhdhr_109986464 9.60 48
20 nhdhr_121625003 8.98 105
# … with 2,367 more rows
The first 14 of those lakes all have monitoring locations that are prefixed with IL_EPA-
and they are mostly in the NE corner of IL:
Additionally, many (all?) seem to have ResultAnalyticalMethod/MethodIdentifier
as "LAB"...which makes me wonder if these are the temperatures in the lab for some other extraction method vs actual field measurements...
Out of all of these sites,
table(d$`ResultAnalyticalMethod/MethodIdentifier`)
FIELD LAB
2714 1609
median of FIELD
is 21.205°, median of LAB
is 3°...
I have tacked on the monitoring ID to the source
field for wqp data in the daily obs temperature build, so instead of getting source = 'wqp'
we get a lot of different wqp sources, such as wqp_LCOWIS_WQX-E16
Now, I can group by source
instead of site_id
and take an RMSE to see if there are particular sources that are really bad vs pb0 (this is from the pgmtl-data-release pipeline btw):
mutate(pb0_matched_to_observations, pred_diff = pred-obs) %>%
group_by(source) %>% summarize(rmse = sqrt(mean((pred_diff)^2, na.rm=TRUE)), n = length(source)) %>% arrange(desc(rmse)) %>% print(n=100)
# A tibble: 2,924 x 3
source rmse n
<chr> <dbl> <int>
1 wqp_LCOWIS_WQX-E16 19.7 7
2 wqp_LCOWIS_WQX-E-16 15.5 154
3 wqp_IL_EPA-RML-1 15.4 7
4 wqp_USGS-475150098210000 15.1 2
5 wqp_LCOWIS_WQX-E-9 14.7 36
6 wqp_SDDENR_WQX-WHITELAWL03 13.9 26
7 wqp_WIDNR_WQX-10031157 11.6 74
8 wqp_MNPCA-21-0057-00-206 11.4 14
9 wqp_SDWRAP-SWLAZZZ3703A 10.9 6
10 wqp_MNPCA-21-0103-00-202 10.7 24
11 wqp_SDDENR_WQX-WALLZZZWL08 10.7 8
12 wqp_LCOWIS_WQX-E17 10.6 12
13 wqp_MNPCA-21-0106-01-204 10.5 24
14 wqp_MNPCA-21-0106-02-201 10.4 8
15 wqp_IL_EPA_WQX-WGZJ-2 10.3 2
16 wqp_WIDNR_WQX-10029926 9.92 174
17 wqp_MNPCA-21-0085-00-207 9.76 24
18 7a_temp_coop_munge/tmp/South_Center_DO_2018_09_11_All.rds 9.61 853
19 7a_temp_coop_munge/tmp/Carlos_DO_2018_11_05_All.rds 9.57 996
20 wqp_MNPCA-21-0054-00-205 9.53 23
21 7a_temp_coop_munge/tmp/Greenwood_DO_2018_09_14_All.rds 9.51 1043
22 wqp_MNPCA-77-0150-02-205 9.34 52
23 wqp_MNPCA-69-0939-02-203 9.23 18
24 wqp_MNPCA-82-0001-00-206 8.98 2
25 wqp_WIDNR_WQX-10033610 8.92 4
26 wqp_IL_EPA_WQX-RGE-2 8.91 5
27 wqp_NARS_WQX-NLA06608-0859 8.83 20
28 wqp_LCOWIS_WQX-E-17 8.78 82
29 wqp_IL_EPA_WQX-RGE-1 8.65 92
30 wqp_IL_EPA_WQX-RTI-3 8.48 3
31 wqp_WIDNR_WQX-443514 8.36 12
32 wqp_MNPCA-27-0139-00-201 8.33 253
33 wqp_MNPCA-21-0052-00-205 8.32 24
34 wqp_USGS-454616092082100 8.25 6
35 wqp_IL_EPA_WQX-RGL-1 8.17 140
36 wqp_MNPCA-70-0091-00-452 8.05 1
37 wqp_MNPCA-11-0246-00-201 8.03 1
38 wqp_IL_EPA_WQX-RPC-2 8.02 7
39 wqp_MNPCA-19-0071-00-202 7.98 5
40 wqp_MNPCA-69-0790-00-201 7.87 43
41 wqp_MNPCA-27-0133-10-101 7.85 120
42 wqp_NALMS-6703 7.83 4
43 wqp_WIDNR_WQX-403112 7.78 75
44 wqp_MNPCA-69-0694-00-117 7.73 1
45 wqp_IL_EPA_WQX-RHD-2 7.71 3
46 wqp_MNPCA-18-0372-00-101 7.69 95
47 wqp_NALMS-3283 7.63 1
48 wqp_USGS-480352099093800 7.61 11
49 wqp_USGS-425235088075302 7.60 1
50 wqp_WIDNR_WQX-403107 7.59 485
51 wqp_MNPCA-29-0142-00-201 7.58 10
52 wqp_IL_EPA_WQX-RTW-1 7.58 134
53 wqp_MNPCA-21-0080-00-204 7.56 24
54 wqp_USGS-482018092292001 7.48 36
55 wqp_MNPCA-73-0139-00-204 7.45 57
56 wqp_WIDNR_WQX-193050 7.40 17
57 wqp_IL_EPA-WGX-1 7.38 7
58 wqp_IL_EPA_WQX-WGZJ-1 7.37 63
59 wqp_MNPCA-27-0062-03-202 7.36 1
60 wqp_MNPCA-18-0044-00-201 7.31 1
61 wqp_MNPCA-69-0859-02-201 7.28 5
62 wqp_USGS-423755088341700 7.26 40
63 wqp_USGS-435721084561801 7.25 5
64 wqp_MNPCA-15-0068-00-207 7.24 38
65 wqp_IL_EPA_WQX-RPA-1 7.17 99
66 wqp_LCOWIS_WQX-W-4 7.17 216
67 wqp_MNPCA-82-0033-00-201 7.15 50
68 wqp_MNPCA-62-0005-00-201 7.14 2
69 wqp_WIDNR_WQX-403110 7.13 1712
70 wqp_MNPCA-82-0031-00-201 7.12 5
71 wqp_MNPCA-21-0123-00-218 7.10 24
72 wqp_MNPCA-27-0014-00-201 7.09 2286
73 wqp_MNPCA-77-0215-00-209 7.07 101
74 7a_temp_coop_munge/tmp/Tenmile_1997_Temperatures.rds 7.06 28
75 wqp_SDDENR_WQX-KINGSBUC03 7.02 16
76 wqp_MNPCA-71-0159-00-203 7.00 5
77 wqp_USGS-454856094544602 6.99 37
78 wqp_MNPCA-77-0215-00-202 6.98 80
79 wqp_IL_EPA_WQX-RGE-3 6.98 8
80 wqp_21NDHDWQ-385455 6.98 5
81 wqp_MNPCA-82-0110-00-451 6.92 22
82 wqp_MNPCA-16-0253-00-202 6.86 1
83 wqp_USGS-444016085310201 6.82 6
84 wqp_MNPCA-19-0024-00-451 6.80 11
85 wqp_MNPCA-27-0129-00-201 6.77 1
86 wqp_IL_EPA_WQX-RGB-2 6.77 9
87 wqp_MNPCA-18-0358-00-201 6.75 4
88 wqp_MNPCA-69-0939-01-204 6.73 89
89 wqp_LCOWIS_WQX-RND-3 6.71 168
90 wqp_WIDNR_WQX-513088 6.67 307
91 wqp_WIDNR_WQX-013144 6.66 103
92 wqp_MNPCA-61-0023-00-204 6.66 10
93 wqp_WIDNR_WQX-10007592 6.63 13
94 wqp_USGS-425235088075300 6.62 28
95 wqp_MNPCA-27-0133-02-205 6.56 2
96 wqp_IL_EPA_WQX-RTW-2 6.55 2
97 wqp_USGS-435009088550100 6.54 9
98 wqp_LCOWIS_WQX-W7 6.51 14
99 7a_temp_coop_munge/tmp/grant_mnlakedata_historicalfiles_manualentry.rds 6.50 64
100 wqp_IL_EPA_WQX-VTJ-1 6.46 129
# … with 2,824 more rows
and taking the first one off the top since it has a small number of obs:
pb0_matched_to_observations %>% filter(source == 'wqp_LCOWIS_WQX-E16')
# A tibble: 7 x 6
site_id date depth obs pred source
<chr> <date> <dbl> <dbl> <dbl> <chr>
1 nhdhr_74926427 2013-07-15 7.62 5.78 24.1 wqp_LCOWIS_WQX-E16
2 nhdhr_74926427 2013-07-15 10.7 4.39 24.0 wqp_LCOWIS_WQX-E16
3 nhdhr_74926427 2013-07-15 13.7 3.83 23.9 wqp_LCOWIS_WQX-E16
4 nhdhr_74926427 2013-07-15 16.8 3.83 23.8 wqp_LCOWIS_WQX-E16
5 nhdhr_74926427 2013-07-15 19.8 3.83 23.8 wqp_LCOWIS_WQX-E16
6 nhdhr_74926427 2013-07-15 22.9 3.83 23.7 wqp_LCOWIS_WQX-E16
7 nhdhr_74926427 2013-07-15 24.4 3.83 23.7 wqp_LCOWIS_WQX-E16
This is Lake Chippewa in Sawyer, WI
read_csv('out_data/lake_metadata.csv') %>% filter(site_id == 'nhdhr_74926427')
# A tibble: 1 x 9
site_id lake_name group_id meteo_filename centroid_lon centroid_lat SDF state county
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr>
1 nhdhr_74926427 Lake Chippewa 06_N45.50-46.50_W84.50-92.00 nldas_meteo_N45.9375-45.9375_W91.1875-91.1875.csv -91.2 45.9 16.2 WI Sawyer
and it is a complex lake:
The second worst source
is wqp_LCOWIS_WQX-E-16
which is probably the same monitoring ID. It is definitely in the same lake.
Modeled (red) and observed (black) are very different
The pb0 model thinks this is a well-mixed lake (at least up to 25 m deep) while the obs are a strongly stratified system that looks more like a small lake to me. Perhaps this is a bay.
Other sources seem clearly wrong, like 7a_temp_coop_munge/tmp/Greenwood_DO_2018_09_14_All.rds
, which looks like the depths are flipped 👀
@limnoliver heads up on that one ☝️ but note we haven't done any kind of comprehensive look.
Looks like at least
7a_temp_coop_munge/tmp/South_Center_DO_2018_09_11_All.rds
, 7a_temp_coop_munge/tmp/Carlos_DO_2018_11_05_All.rds
, and 7a_temp_coop_munge/tmp/Greenwood_DO_2018_09_14_All.rds
have depths flipped
Yikes! The explainer file for South_Center
says:
Note that the depth of the sample is in negative.
And that was interpreted (by me) as simply needing to multiply by -1. And, turns out, I processed South Center, Carlos, and Greewood with the same parser, and did the same thing, since all had negative depth vals. So, more likely, this is distance from bottom, where 0 is bottom, and ~-28m is surface? In that case, I'm guessing we will lose these data because we can't be certain on depth? OR, we assume the first measure is taken at 0m?
Perhaps looping in w/ Holly related to these files and #173 would be good. Doesn't help us for this immediate issue, but probably good to get on the radar.
Now that we have the 6_evaluation stage set up, I was looking at some of the worst performers
Some of these have observations that don't make sense, such as 1° temperatures in October
Same kind of integer pattern in
obs
for another onewith values that don't make sense.
I thought maybe these would be a coop source where a column was flipped or something, but for the top worst sites, they all have
wqp
as the only sourceThis pattern seems to continue to at least the 20th worst site
I wonder if these are all from the same provider?