DOI-USGS / geoknife

R tools for geo-web processing of gridded data via the Geo Data Portal. geoknife slices up gridded data according to overlap with irregular features, such as watersheds, lakes, points, etc.
https://doi-usgs.github.io/geoknife/
Other
69 stars 23 forks source link

Gages II as a webgeom #381

Closed dblodgett-usgs closed 2 years ago

dblodgett-usgs commented 5 years ago
library(geoknife)
fabric <- webdata("prism")
stencil <- webgeom(url = "https://cida.usgs.gov/nwc/geoserver/NWC/ows", 
                   geom = "NWC:gagesii_basins", attribute = "gage_id")
values(stencil) <- "01013500"
wp <- webprocess(REQUIRE_FULL_COVERAGE = "false")
job <- geoknife(stencil, fabric, wp, wait = TRUE)
intersected <- result(job)
plot(intersected$DateTime, intersected$`01013500`)
Flash-Of-Thunder commented 2 years ago

"https://cida.usgs.gov/nwc/geoserver/NWC/ows" "This website, the NWC-DP, has been deprecated as of September 1, 2020. All data accessible through the NWC-DP are also available on NWISWeb (https://waterdata.usgs.gov/) and will remain so in perpetuity. This website now redirects to NWISWeb. Please direct any questions about this web application shutdown to gs-w-iow_po_team@usgs.gov."

Hmm so what's the new URL to get the stencil? If I find it I'll post here.

Flash-Of-Thunder commented 2 years ago

I found a zip file of the boundaries but I'm not sure that's the solution: https://water.usgs.gov/GIS/dsdl/boundaries_shapefiles_by_aggeco.zip

dblodgett-usgs commented 2 years ago

Hi @Flash-Of-Thunder -- Many of the layers on the geoserver in question moved here. https://labs.waterdata.usgs.gov/geoserver/web/wicket/bookmarkable/org.geoserver.web.demo.MapPreviewPage?1&filter=false

That gagesii basins layer did not.

You could use the shapefile you found, depending on your use case. Can you describe more of what you are trying to do? I can probably help with a work around.

I'll also ask about posting that basin boundary set to the new geoserver. I can think of a few applications for it now.

Flash-Of-Thunder commented 2 years ago

Wow swift response. My goal is to pull averaged daymet data (e.g. prcp, tmax, tmin) for specific basins. E.g. the daily time series of prcp for "01013500". I was using a python package, pydaymet, for this but it was taking an exceptionally long time for multiple basins and I was wondering if I'd have more success with this R tool.

EDIT: data does not have to averaged coming in, I can do that myself but would mean I'm downloading a larger file than necessary.

Flash-Of-Thunder commented 2 years ago

Is it possible to use an existing polygon as a stencil. As there is this tool: https://usgs-r.github.io/nhdplusTools/reference/get_nldi_basin.html

Or perhaps I look to see where they are pulling this from?

EDIT: So it looks like the new url for pulling a polygon is something like this: https://labs.waterdata.usgs.gov/api/nldi/linked-data/nwissite/USGS-08279500/basin

based on the information from: https://waterdata.usgs.gov/blog/nldi-intro/

dblodgett-usgs commented 2 years ago

Yeah --- that is one way to get a basin boundary.

You would convert that into a simplegeom.

Something like this:

sites <- c("USGS-08279500", "USGS-06287800")

basins <- lapply(sites, function(x){
  nhdplusTools::get_nldi_basin(list(featureSource = "nwissite",
                                    featureID = x))
})

basins <- geoknife::simplegeom(do.call(rbind, basins))

Probably best to group these spatially for passing along to the GDP. e.g. if you need basins spread out all over the place (but clusters) run them through the GDP in the clusters.

Flash-Of-Thunder commented 2 years ago

Thanks! I see that the function did pull the polygon but saved it as an "sf" object which produced an error when read by simplegeom. I'd think it's just a simple reorganizing of the data structure, but I couldn't find the documentation to get it in the right spot.

basins <- simplegeom(do.call(rbind, basins)) Error in as(.Object, "simplegeom") : no method or default for coercing “sf” to “simplegeom”

image

dblodgett-usgs commented 2 years ago

Please try the latest version of geoknife. remotes::install_github("usgs-r/geoknife")

Flash-Of-Thunder commented 2 years ago

Brilliant! It works, but webdata("daymet") is now producing an error. I was able to pull prism data using the following code:

install.packages("remotes")
remotes::install_github("usgs-r/geoknife")
install.packages("nhdplusTools")

library("geoknife")
library("nhdplusTools")

sites <- c("USGS-08279500", "USGS-06287800")

basins <- lapply(sites, function(x){
  nhdplusTools::get_nldi_basin(list(featureSource = "nwissite",
                                    featureID = x))
})

stencil <- simplegeom(do.call(rbind, basins))
fabric <- webdata("prism")
times(fabric) <- as.POSIXct(c('2003-01-01','2005-01-01'))
job <- geoknife(stencil, fabric, wait = TRUE)

But when I changed "prism" to "daymet" I got the following error: General Error: java.io.IOException: java.io.IOException: opendap.dap.DAP2Exception: opendap.dap.DAP2Exception: Not a valid OPeNDAP server - Missing MIME Header fields! Either "XDAP" or "XDODS-Server." must be present

The issue is mentioned in #339 but I can't tell if it's the same bug.

dblodgett-usgs commented 2 years ago

OK, the URL in code is actually wrong for daymet.

If you do url(fabric) <- "https://thredds.daac.ornl.gov/thredds-daymet/dodsC/daymet-v3-agg/na.ncml"

And try again, it will work. I'll get that url fixed in source.

dblodgett-usgs commented 2 years ago

The latest from github should now work?

Flash-Of-Thunder commented 2 years ago

Yes, I just tried it by reinstalling the w/ remotes and using "daymet", works great. I appreciate the lightning fast support!

I do have a follow-up thought, but I'm not sure if there is a clear answer. To download 40 years of daymet data for a single basin is taking about ~40 mins to ~1hr (which perhaps unsurprisingly is about the same amount of time the python package pydaymet takes). So to pull the amount of basins for a ML model I wanted to run it would take ~7-14 days (300 basins at ~ 1 hr each. Which is very manageable, except currently I'm trying to download it directly to a cluster and it keeps crashing, but these are personal problems. The main thing I'm wondering about is, I had manually done a similar process with CONUS NOAA climate data at 1/16 degree. Where I downloaded ~200 gbs directly from NOAA and then pulled the grids cells data for specific basins, and after the download (which took ~2 days for all of CONUS) it was only a few minutes per basin. So I guess I'm wondering where the current bottleneck comes in. Is it the daymet server or the public server the data is running through?

dblodgett-usgs commented 2 years ago

Thanks for engaging here. It's great to have someone who wants to get to the bottom of these things.

There are a couple issues at pay here.

Probably the biggest one is http overhead and speed. Using web services for data like this incurs a lot of over the wire performance limiters. We can try to minimize those, but it's a fact of life.

The other thing is how the actual process is architected. Looking at this on a per-basin basis might not be the most efficient. That's where I was going with the spatial clusters comment above.

If your basins are spread out all over the place, try running them all in one go -- perhaps split up through time so an individual process doesn't take more than a few hours. That will help keep the total number of web requests down.