Closed padilla410 closed 2 years ago
Re: how many additional lakes were added. I think we need to compare to my additions in #329, which would actually mean 27 additional PGDL lakes and no additional GLM ones? Not to downplay the impact of these xwalk fixes ...
(don't think this one is necessary to resolve) I first ran just scmake(), but bumped into an error at 7b_temp_merge/out/source_metadata_for_release.csv.ind
Yes, this target is a depends of all
in the remake.yml but not included in dependencies necessary to build 8_viz
, so this target often gets skipped because most of us have been in the habit of running scmake('8_viz')
. This is a file we do use as part of data releases to attribute individual observations to their contributing org. But it isn't necessary to fix this right now and as I've mentioned in the past, I'm not sure I understand the use cases for require_local.
@jread-usgs I created #334 to remind us of this issue.
Lindsay and I worked through the differences between our two branches. She is missing some local files from 7a_temp_coop_munge/tmp/
that I am not missing. That explains the difference.
We verified that this PR represents the latest/greatest version or 7a_temp_coop_munge/out/all_coop_dat_linked.feather
Overview
This pull request addresses the issues ID'd in #267 - I'm adding missing crosswalks that caused IN and SD to fall out of the pipeline. There is a lot going on in this PR because I was checking as I went along. Here is the general layout:
scmake("8_viz")
~This work resulted in 101 additional PGDL lakes and 339 GLM lakes when compared to PR #327~ This work resulted in 27 additional PGDL lakes and 0 GLM lakes when compared to PR #328
Tagging Lindsay for the review, but @jread-usgs feel free to weigh in.
closes #267
Grand Summary
I was able to successfully add crosswalks for the datasets identified in #267.
When checking my work on parsers, I rely heavily on the
dat_missing
table - an internal table created in thecrosswalk_coop_dat()
function. Before my updates, theall_missing
field wasTRUE
for all three of these datasets. After my updatesall_missing == FALSE
To successfully run the code snippet below you need to to place
browser()
before thewarning()
near the end ofcrosswalk_coop_dat()
in7a_temp_coop_munge/src/parsing_task_fxns.R
When looking through the
diff
I suspect that you are going to see a number of "unexpected" rebuilds - I think most of these can be attributed to the recent factory reset on my computer. For example, I had to rebuild'7_config_merge/out/nml_list.rds.ind'
in order to completescmake('8_viz')
because I did not have a local copy.SD Crosswalk
This section includes the following verification steps for the SD data:
scmake("1_crosswalk_fetch")
created the correctsf
objectscmake("2_crosswalk_munge")
worked as expectedVerification that the
scmake("1_crosswalk_fetch")
created the correctsf
objectThis section confirms that the spatial points for the SD data set map in South Dakota.
The result:
Verification that
scmake("2_crosswalk_munge")
worked as expectedThis section confirms that the SD crosswalk finds matches in NHDHR when munged together.
IN Crosswalks
One crosswalk is used to create two NHDHR crosswalks. This is a necessary complication because one IN data set did have spatial data (
Indiana_CLP_lakedata_1994_2013
) while the other one didn't (Indiana_Glacial_Lakes_WQ_IN_DNR
). These two datasets also use different lake reference systems (details here)Verification that the
scmake("1_crosswalk_fetch")
created the correctsf
objectFor the Indiana datasets, there is a little more going on under the hood so there are a few more checks.
sf
object verificationThis section confirms that the spatial points for the two IN data sets map in Indiana.
IN CLP & IN DNR maps results (they're the same):
site_id
verificationThis verification confirms that the same function in
1_crosswalk_fetch/src/fetch_crosswalk.R
(fetch_IN_points
) creates two differentsite_id
values: one for each dataset.Verification that
scmake("2_crosswalk_munge")
worked as expectedThis section confirms that both IN crosswalks find matches in NHDHR when munged together.
Bonus fixes and weird stuff
Sand_Bay_*
data sets from 2014-2016 weren't getting intoall_coop_dat_linked.feather
. I was able to debug this issueall_coop_dat_linked.feather
has around 33,000 data points (filtersource == "7a_temp_coop_munge/tmp/Solomon_LIMNO_PROFILES_10.25390_caryinstitute.7438598.v5.rds"
). I think this is due to a duplicate matching process going on withincrosswalk_coop_dat
.crosswalk_coop_dat
the UNDERC dataset usesunderc2nhd
as the crosswalk to NHDHR. Upstream of that match, there is a matching process that uses a lookup table calledid2nhd
, I suspect there are multiple lakes that exist in the UNDERC dataset and the data set that crosswalks withid2nhd
.7b_temp_merge
Results from
scmake("8_viz")
~After completing the above, we pick up 101 additional PGDL lakes and 339 GLM lakes (compared to PR #327).~ This work resulted in 27 additional PGDL lakes and 0 GLM lakes when compared to PR #328
Snapshot of 8_viz/out/lakes_summary_fig.html: