Closed jordansread closed 4 years ago
This is where we currently pull in a bunch of different shapefiles and also fetch pre-calculated crosswalk tables. We fetch the Winslow shapefile.
Instead we'll need to fetch raw NHD_HR polygons and drop the fetching of any pre-canned crosswalks, since they are all based on medium res
These are methods that take two shapefiles and figure out join IDs, or do point in polygon analysis. This is where we build new crosswalk tables in contrast to the canned ones in 1_
. We also buffer the lake shapefiles in this step. Not sure why it lives here, but I think I put it here.
We'll probably do similar things here, other than changing some of the source files to correspond to changes in 1_
This is where we get all kinds of things that are lake-specific params, like NLCD, bathy, depths, clarity, etc.
With changes that will drop Winslow as the source of pre-compiled attributes/params, we'll probably need more here.
These are munged attributes that would now be linked to the canonical ID (previously, nhd_{ID})
We're probably moving away from lakeattributes
, so the naming of targets according to that package as a target should probably (?) change. E.g., 4_params_munge/out/lakeattributes_area.rds.ind
. Changes in this phase are probably otherwise minimal.
These should be fine, as they would update w/ the shapes and centroids coming out of 1_
This uses crosswalks to connect coop data to our canonical IDs.
We'll need to adjust these w/ changes to crosswalks.
This uses crosswalks to connect coop data to our canonical IDs.
We'll need to adjust these w/ changes to crosswalks.
RE: 4_params_munge -- yes, many of these will be the same, but will take on new names that don't infer they're being formatted for lakeattributes
. There will also need to be additional targets in this step, that pull in data that currently resides in lakeattributes
(see Jordan's comments above with links to data).
I've got a working fetcher/processor for the NHD files. But the Permanent_
, which is what I'm pretty sure we want for site_id
has some goofy IDs. I don't think this will cause issues, but something to keep an eye out for. Some IDs are {0014DC77-4688-435F-9EFA-7F056F47D349}
compared to the more common 120017988
formats
sf_waterbodies$Permanent_ %>% as.character %>% nchar %>% table
.
8 9 36 38
46146 63277 489 2212
I'm also combining the DL, filter, mutate stuff all into a single function call for the task table solely to cut back on redundant copies of NHD HR stored locally. Ideally, I'd 1) DL, 2) filter/process, then 3) merge all. But DL and filter/process are combined into a single step. Kind of a pain because this means every time we change how we filter (or add/remove a lake from keep_IDs
or remove_IDs
), we will have to download all of the files again.
Wondering whether this was a bad idea to combine the two steps...
Even though the zip file for NHD HR for the state of WI is ~500MB, if I filter down to only lakes/ponds/impoundment waterbody features and save as an .rds file, it is only 65MB. Maybe that is worth keeping around? MN may be twice as big, but the other states would be smaller.
👍 on the Permanent_
as site_id
. NHD
(not plus) uses PERMIDs
as identifier while NHDPlus
uses COMIDs
as identifier. Waterbodies that have COMID
assigned will retain that as their PERMID
so I think we should be OK switching from NHD to NHDPlus
Thanks Jake. I wonder if lakes that don't have COMIDs are the ones that have the long char Permanent_
That was my guess but it would be nice to know for sure
it is only 65MB. Maybe that is worth keeping around?
That seems like a reasonable file size to have around.
total sf
object file size (as .rds file) is 230MB for the 8 states, filtered down to lakes > 4 ha. Seems reasonable as a starting point. We'll add ways to get the keep_IDs
and remove_IDs
implemented
A gotcha I ran into: seems the shapefiles from ftp://rockyftp.cr.usgs.gov/vdelivery/Datasets/Staged/Hydrography/NHD/State/HighResolution/ are truncated to 200K features(!), cutting off lakes in the dakotas, MN, and maybe some other states that I didn't check. I switched to GDB files and it seemed to resolve this issue, but I found it surprising.
Kelsey noticed that shapefiles that contain over 200K features are broken up into multiple files. I didn't catch that...
Processing the WQP to NHD HR crosswalk yields 50,278 unique monitoring locations. The old one was 3,700 locations (!) from back in May. Difference could be a combination of new sites added to WQP (but probably not that many) and the update to NHD high-res. We should be on the lookout for a lot of these sites being shoreline, which may not be representative of the lake on average.
Updates:
This is in pretty good shape, but I think we'll want two finalize targets at the end of the NHD HR task table, one that includes attributes, and the other that just has the merged shapefiles. We can use the former to propagate GNIS names through for later use in the visualize stage. We also don't have a fetcher yet for the WIDNR hydrolayer or the winslow shapefile. We'll need the WIDNR file to create WBIC crosswalks in 2_
to do:
We've got this pretty far along too, but Kelsey is going to revise the poly/poly crosswalk function. When we have an updated polygon crosswalk function, it might make sense to go back to 1_
and filter out the great lakes. Not a big deal, but we aren't going to model those and they end up w/ a lot of wqp sites and data that we are carrying along for no reason. I was avoiding doing that right now because I didn't want to re-calc all of the crosswalks (it is slow right now).
to do:
st_intersect
and has a percentage overlap threshold. (see #70 )In good shape, just want to track down more depths if we can.
Also, this stage doesn't include WQP secchi right now, but maybe it should. That appears later in 6_temp_wqp
, but secchi could be viewed as either a param or a driver, so maybe I am splitting hairs here...
So far so good, but by the end of this stage we should probably have single tables (or lists in the case of hypsography) for params that go into glm.nml. Haven't dealt with that yet.
There are also some better functions available for the NLCD calculation (I think #69 covers this).
Haven't touched this yet for NLDAS, but working on revising the WQP part of 6_
.
This uses crosswalks to connect coop data to our canonical IDs.
We'll need to adjust these w/ changes to crosswalks.
We'll need to adjust these w/ changes to crosswalks.
Calling this done
This is a conversation we've been having for awhile. We don't fully understand the effort to pivot away from the NHD med-res lake shapes that appear in the Winslow et al paper. Starting from that data release and expanding was a known shortcut at the time. Now we are realizing that this is limiting us in ways that we did not expect, and it may become a priority sooner to move to HR.
This issue is meant to capture discussion as we learn about the level of effort needed here. Welcome @jzwart @limnoliver