Closed dblodgett-usgs closed 2 years ago
"https://cida.usgs.gov/nwc/geoserver/NWC/ows" "This website, the NWC-DP, has been deprecated as of September 1, 2020. All data accessible through the NWC-DP are also available on NWISWeb (https://waterdata.usgs.gov/) and will remain so in perpetuity. This website now redirects to NWISWeb. Please direct any questions about this web application shutdown to gs-w-iow_po_team@usgs.gov."
Hmm so what's the new URL to get the stencil? If I find it I'll post here.
I found a zip file of the boundaries but I'm not sure that's the solution: https://water.usgs.gov/GIS/dsdl/boundaries_shapefiles_by_aggeco.zip
Hi @Flash-Of-Thunder -- Many of the layers on the geoserver in question moved here. https://labs.waterdata.usgs.gov/geoserver/web/wicket/bookmarkable/org.geoserver.web.demo.MapPreviewPage?1&filter=false
That gagesii basins layer did not.
You could use the shapefile you found, depending on your use case. Can you describe more of what you are trying to do? I can probably help with a work around.
I'll also ask about posting that basin boundary set to the new geoserver. I can think of a few applications for it now.
Wow swift response. My goal is to pull averaged daymet data (e.g. prcp, tmax, tmin) for specific basins. E.g. the daily time series of prcp for "01013500". I was using a python package, pydaymet, for this but it was taking an exceptionally long time for multiple basins and I was wondering if I'd have more success with this R tool.
EDIT: data does not have to averaged coming in, I can do that myself but would mean I'm downloading a larger file than necessary.
Is it possible to use an existing polygon as a stencil. As there is this tool: https://usgs-r.github.io/nhdplusTools/reference/get_nldi_basin.html
Or perhaps I look to see where they are pulling this from?
EDIT: So it looks like the new url for pulling a polygon is something like this: https://labs.waterdata.usgs.gov/api/nldi/linked-data/nwissite/USGS-08279500/basin
based on the information from: https://waterdata.usgs.gov/blog/nldi-intro/
Yeah --- that is one way to get a basin boundary.
You would convert that into a simplegeom.
Something like this:
sites <- c("USGS-08279500", "USGS-06287800")
basins <- lapply(sites, function(x){
nhdplusTools::get_nldi_basin(list(featureSource = "nwissite",
featureID = x))
})
basins <- geoknife::simplegeom(do.call(rbind, basins))
Probably best to group these spatially for passing along to the GDP. e.g. if you need basins spread out all over the place (but clusters) run them through the GDP in the clusters.
Thanks! I see that the function did pull the polygon but saved it as an "sf" object which produced an error when read by simplegeom. I'd think it's just a simple reorganizing of the data structure, but I couldn't find the documentation to get it in the right spot.
basins <- simplegeom(do.call(rbind, basins)) Error in as(.Object, "simplegeom") : no method or default for coercing “sf” to “simplegeom”
Please try the latest version of geoknife. remotes::install_github("usgs-r/geoknife")
Brilliant! It works, but webdata("daymet") is now producing an error. I was able to pull prism data using the following code:
install.packages("remotes")
remotes::install_github("usgs-r/geoknife")
install.packages("nhdplusTools")
library("geoknife")
library("nhdplusTools")
sites <- c("USGS-08279500", "USGS-06287800")
basins <- lapply(sites, function(x){
nhdplusTools::get_nldi_basin(list(featureSource = "nwissite",
featureID = x))
})
stencil <- simplegeom(do.call(rbind, basins))
fabric <- webdata("prism")
times(fabric) <- as.POSIXct(c('2003-01-01','2005-01-01'))
job <- geoknife(stencil, fabric, wait = TRUE)
But when I changed "prism" to "daymet" I got the following error:
General Error: java.io.IOException: java.io.IOException: opendap.dap.DAP2Exception: opendap.dap.DAP2Exception: Not a valid OPeNDAP server - Missing MIME Header fields! Either "XDAP" or "XDODS-Server." must be present
The issue is mentioned in #339 but I can't tell if it's the same bug.
OK, the URL in code is actually wrong for daymet.
If you do
url(fabric) <- "https://thredds.daac.ornl.gov/thredds-daymet/dodsC/daymet-v3-agg/na.ncml"
And try again, it will work. I'll get that url fixed in source.
The latest from github should now work?
Yes, I just tried it by reinstalling the w/ remotes and using "daymet", works great. I appreciate the lightning fast support!
I do have a follow-up thought, but I'm not sure if there is a clear answer. To download 40 years of daymet data for a single basin is taking about ~40 mins to ~1hr (which perhaps unsurprisingly is about the same amount of time the python package pydaymet takes). So to pull the amount of basins for a ML model I wanted to run it would take ~7-14 days (300 basins at ~ 1 hr each. Which is very manageable, except currently I'm trying to download it directly to a cluster and it keeps crashing, but these are personal problems. The main thing I'm wondering about is, I had manually done a similar process with CONUS NOAA climate data at 1/16 degree. Where I downloaded ~200 gbs directly from NOAA and then pulled the grids cells data for specific basins, and after the download (which took ~2 days for all of CONUS) it was only a few minutes per basin. So I guess I'm wondering where the current bottleneck comes in. Is it the daymet server or the public server the data is running through?
Thanks for engaging here. It's great to have someone who wants to get to the bottom of these things.
There are a couple issues at pay here.
Probably the biggest one is http overhead and speed. Using web services for data like this incurs a lot of over the wire performance limiters. We can try to minimize those, but it's a fact of life.
The other thing is how the actual process is architected. Looking at this on a per-basin basis might not be the most efficient. That's where I was going with the spatial clusters comment above.
If your basins are spread out all over the place, try running them all in one go -- perhaps split up through time so an individual process doesn't take more than a few hours. That will help keep the total number of web requests down.