DOI-USGS / nhdplusTools

See official repository at: https://code.usgs.gov/water/nhdplusTools
https://doi-usgs.github.io/nhdplusTools/
Creative Commons Zero v1.0 Universal
85 stars 33 forks source link

Issue with `get_nhdplushr` for a few basins #411

Open tjstagni opened 1 month ago

tjstagni commented 1 month ago

Hi @dblodgett-usgs, I'm trying to load a few basins with the get_nhdplusHR function and I'm running into a couple of issues.

For these HUC04 basins, 1704, 0318, and 0307 The function never completes, and the data is not loaded into R. I had basin 0318 running for a few hours and it still did not complete. I've download all HUC04s surrounding these basins and had no issues with any of the adjacent basins.

Also, running this basin 0201 gave me this error, and the basin was not downloaded using download_nhdplushr

image

Here is a reproducible example and I'm using the latest version 1.2.1:

temp_dir = file.path(nhdplusTools_data_dir(), "temp_hr_cache")

download_dir = download_nhdplushr(temp_dir, "1704")

hr = get_nhdplushr(download_dir, file.path(download_dir, "nhdplus_out.gpkg"),
layers=c("NHDFlowline","NHDPlusBurnWaterbody"))
dblodgett-usgs commented 1 month ago

I see how this is just stalled out. Will have to do some looking to see what the deal is.

There is a mix of casing in the nhdplushr attributes that cause some issues. I thought I'd handled all the issues but may have missed one. To verify, are you on the latest hydroloom as well? https://github.com/DOI-USGS/hydroloom

tjstagni commented 1 month ago

Yes, I have hydroloom 1.1.0 installed but I'm not using for this.

dblodgett-usgs commented 1 month ago

OK -- so my theory is that this is the culprit -- it did eventually finish.

Warning message:
In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
  GDAL Message 1: organizePolygons() received a polygon with more than 100 parts. The processing may be really slow.  You can skip the processing by setting METHOD=SKIP, or only make it analyze counter-clock wise parts by setting METHOD=ONLY_CCW if you can assume that the outline of holes is counter-clock wise defined
dblodgett-usgs commented 1 month ago

Yeah -- I turned off polygon ring direction checks and it runs fine. NHDPlusHR is a fairly clean dataset from that perspective so I feel comfortable leaving it off. Once #412 is merged, you can install from github and this should work.

tjstagni commented 1 month ago

@dblodgett-usgs thanks so much for working quickly on the updates. Do you have a sense for how long get_nhdplushr should run with the new updates? My current run with basin "1704" has been going for 45 minutes and it still has not completed.

Also, I still have the same error with basin "0201", The basin does not download when using download_nhdplus

dblodgett-usgs commented 1 month ago

Apologies -- I rushed this "fix" and missed what was actually changed to make it work faster. (I had a cached gpkg that was being read fast!) Will get the actual fix up in a moment.

tjstagni commented 1 month ago

@dblodgett-usgs no worries, thank you! Does this fix include the solving the issue for basin "0201" as well?

dblodgett-usgs commented 1 month ago

0201 doesn't exist in archive or current under here: https://prd-tnm.s3.amazonaws.com/index.html?prefix=StagedProducts/Hydrography/NHDPlusHR/VPU/

So not a lot I can do there. :/

dblodgett-usgs commented 1 month ago

OK -- so my guess about what this was turned out to be wrong.

Something was hung in the make_standalone() call.

This should work:

temp_dir = file.path(nhdplusTools_data_dir(), "temp_hr_cache")

download_dir = download_nhdplushr(temp_dir, "1704")

unlink(file.path(download_dir, "nhdplus_out.gpkg"))

hr = get_nhdplushr(download_dir, file.path(download_dir, "nhdplus_out.gpkg"),
                   layers=c("NHDFlowline","NHDPlusBurnWaterbody"), 
                   check_terminals = FALSE)

check_terminals = FALSE just avoids the call to make_standalone() though so your network won't be self contained unless everything terminates to the coast.

I'll try and figure out what's hanging in that call when I have a few minutes but that should get you unstuck for now.