DOI-USGS / hyRefactor

https://code.usgs.gov/wma/nhgf/reference-fabric/hyrefactor
Creative Commons Zero v1.0 Universal
5 stars 0 forks source link

Area-based refactor #14

Open dblodgett-usgs opened 3 years ago

dblodgett-usgs commented 3 years ago

The collapse phase of refactor_nhdplus should take an area threshold in addition to length. This needs a little research but theoretically, it should be possible to have a minimum area and potentially maximum area input. The interplay of area and flowline length may be complicated but some opinionated implementation decisions would make it possible.

There are a few steps to think about here. First considering 'collapse', which combines flowlines and their catchment areas, there are two thresholds -- one specific to flowlines that do not cross confluences, which I refer to as 'mainstem' in code -- and one more general that applies to all flowlines. These are both almost always applied in a test like: flines$LENGTHKM < thresh or flines$LENGTHKM < mainstem_thresh_use.

There are places where we will need to add a get_ds_area as is done here: https://github.com/dblodgett-usgs/hyRefactor/blob/be7d2fc3b19fa79f7555e7f47106c91b8dbaa5ce/R/collapse.R#L66 and here: https://github.com/dblodgett-usgs/hyRefactor/blob/8b4b35f2abd181be26cf034881114c2b8fd67815/R/hyRefactor.R#L101 because collapse looks downstream to determine what can be combined in some cases.

By the looks of things, we could modify the thresh and mainstem_thresh input to collapse_flowlines() to accept either a numeric (which would assume length) or a list with two named entries: list(length = x, area = y).

In the function initialization, we would always convert the numeric to a list where y = max(area) so it would have no affect. Then every time thresh is used, we would test against both length and area.