DOI-USGS / nhdplusTools

See official repository at: https://code.usgs.gov/water/nhdplusTools
https://doi-usgs.github.io/nhdplusTools/
Creative Commons Zero v1.0 Universal
85 stars 33 forks source link

Unable to direct `get_nhdplushr()` to previously-downloaded data #391

Closed dmpeterson2 closed 3 months ago

dmpeterson2 commented 4 months ago

Hi! I am trying to build a function for calculating stream-distance between two points, part of which will read in a HUC4 watershed (defined in previous steps as a linestring, the hu object below). However, I will be working across a large portion of the US, and need to function to pull from previously-downloaded data (rather than downloading each iteration) in order to save time and memory.

I have tried both downloading the zipped files from the National Map Downloader (and saving to a separate folder in my working directory) and using the download_nhdplushr() function to manually download relevant HUC4 flowlines (below), but it still cannot pull the correct file in the final step.

library(nhdplusTools)

temp_dir <- file.path(nhdplusTools_data_dir(), "temp_hr_cache")
download_nhdplushr(temp_dir, c("0316", "0315", "0603"))

results in output folders

> library(nhdplusTools)
> 
> temp_dir <- file.path(nhdplusTools_data_dir(), "temp_hr_cache")
> download_nhdplushr(temp_dir, c("0316", "0315", "0603"))
[1] "/Users/Delaney/Library/Application Support/org.R-project.R/R/nhdplusTools/temp_hr_cache/03"
[2] "/Users/Delaney/Library/Application Support/org.R-project.R/R/nhdplusTools/temp_hr_cache/06"

which contain the proper files. However, when I try to define a HUC4 of interest and only read in that watershed:

hu <- "0316"

download_dir <- download_nhdplushr(temp_dir, hu, download_files = FALSE)
hr_data <- get_nhdplushr(download_dir,
                         file.path(download_dir, paste0("nhdplus_", hu, ".gpkg")),
                         layers = "NHDFlowline", overwrite = TRUE)

results in output and errors:

> hu <- "0316"
> 
> download_dir <- download_nhdplushr(temp_dir, hu, download_files = FALSE)
> hr_data <- get_nhdplushr(download_dir,
+                          file.path(download_dir, paste0("nhdplus_", hu, ".gpkg")),
+                          layers = "NHDFlowline", overwrite = TRUE)
Error in st_sf(out) : no simple features geometry column present
Creating dataset https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHDPlusHR/VPU/Current/GDB/NHDPLUS_H_0316_HU4_GDB.zip/nhdplus_0316.gpkg failed.
Error: Creation failed.
In addition: Warning messages:
1: In get_nhdplushr(download_dir, file.path(download_dir, paste0("nhdplus_",  :
  check_terminals is true but attributes selected do not support the checks.
2: In CPL_write_ogr(obj, dsn, layer, driver, as.character(dataset_options),  :
  GDAL Error 4: sqlite3_open(https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/NHDPlusHR/VPU/Current/GDB/NHDPLUS_H_0316_HU4_GDB.zip/nhdplus_0316.gpkg) failed: unable to open database file

Additionally, I've tried playing with it so that the download_dir object is all the watersheds I need to download (the first step), but get_nhdplushr() is then unable to distinguish which HUC4 to pull, and pulls all three in the temp_dir. I figure it's likely an issue with directories and filepaths, but I just haven't been able to get it to work on my end. Thanks!

dblodgett-usgs commented 4 months ago

Hi there -- thanks for the reprex and thorough description. I think I see what's wrong.

download_dir <- download_nhdplushr(temp_dir, hu, download_files = FALSE) with download_files=FALSE returns the URL for the file you want rather than the path to the files you downloaded in your first step.

You'll need to pass temp_dir or another file path to the files you want to open for it to work.

dblodgett-usgs commented 3 months ago

@dmpeterson2 free free to reopen this issue if my response wasn't the solution.