Closed dblodgett-usgs closed 11 months ago
Package: hydroloom Reviewer: Mike Johnson Date: August 30, 2023
(@dblodgett-usgs added check boxes)
RANN::nn2
should be imported? ::
, some using @importsFrom
in the 00_hydroloom.R
file, and some using @importsFrom
in the function roxygen
. My personal preference would be to see all imports in the 00_hydroloom file.R
using the @importFrom
syntax.fastmap
should have importsFrom for fastqueue
and faststack
?kable
or DT
?). It is hard to read with the base outputhydroloom
will support the continued functionality of nhdplusTools
… it reads kinda like the latter will be superseded.stats::
call on line 268. stats
is in the DESCRIPTION but not the importFrom
of this file? importsFrom
here (Lines 195-202) seem partial and oddly selective.is.hy
? (Lines 282-303)stop()
calls… it will concatenate strings for youtibble
? (design notes indicate a preference for tibble
). - - I didn’t think it did but sees it does based on this package::
imports? Specifically, for tidyr
and pbapply
return()
some just leave an output… maybe worth aligning?add_levelpaths(x = test_flowline, name_attribute = "GNIS_ID", weight_attribute = "ArbolateSu")
dplyr
deprecated the .data$
... calls?nrow(x)
instead of length of column?topo_sort
if not already there…bind_rows
over do.call
?.data
callst_as_sf()
?data.table
merge
speed worth the conversion?toid.y
and toid.x
exist… should one be dropped and one renamed?utils
to DESCRIPTION filesapply
or apply
do the trick here instead of do.call(c, …)
?interp_meas
, add_len
and add_index
all be in the same place?rm
needed?.data
calls and explicit function signatures::
calls to units
flines
is left NULLhydroom::
? should get_node
be exported?future
backend is exposed in the details?00_hydroloom.R
? Or within the needed function?data.table
in other places.hydroloom::
syntax.utils.R
.data
calls (I think)navigate_network
over navigate_hydro_network
(matching the file name)bind_rows
?.data$
calls.data$
calls, and quoted selects.data
calltidyr::replace_na
haha?tidyr::unnest
?data.frame(x = 1:2, y = 2:3) %>% sf::st_drop_geometry()
worksHi @dblodgett-usgs,
Here is a really small edit I saw that could be made to the documentation in check_hy_graph.R
'is often referred to as "recursive depth first search".'
, but it looks like you are implementing an "iterative depth first search"
in your code. hah -- but it iterates through what is typically implemented as a recursive algorithm because R is not good with recursion. I'll clarify that it is the recursive depth first search algorithm but implemented with iteration.
Initial structural work complete according to review. I've cleaned up use of ::
and consolidated all imports. There's still a bit of tech debt to take care that will be addressed with future commits for other comments.
To address a number of issues above, my implementation of tidy selection and masking needs to be clarified. I'll add something about this to a CONTRIBUTING.md as well.
.data
was only deprecated for tidy selection. For functions like filter()
, mutate()
, or arrange()
you have to use data masking. The details are described here. https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html I banged my head on this pretty hard when the change was rolled out. See: https://github.com/DOI-USGS/nhdplusTools/issues/306
I think I have been consistent in my use of hydroloom
package attributes behind .data${var}
or .data[[{var}
.
My tidyselect
leverages a set of standardized package attributes declared at the top of 00_hydroloom.R. It may look odd because I am passing package variables that are length one character vectors instead of "strings"
. This felt like a reasonable way to accomplish what I was trying to do with hydroloom names and has worked pretty well so far.
I reorganized 00_hydroloom according to review. I want is.hy to message instead of warn --- calling functions can issue warnings if they get FALSE.
As far as I know, a tibble will always have inherit from data.frame so a data.frame method will capture both tibbles and data.frames. Internally, everything is coerced to tibble when converting to hy
.
In add_divergence, I've clarified some documentation regarding selection of the smallest ID -- this is a time when no other attributes are available to make a distinction and something has to be used to select one or the other.
There is a double unnest that looks strange because it's two of the same call back to back. It is intended.
I've searched the package for all instances of return()
and verified that they are needed rather than leaving off a final value in the function.
I'd rather not import another package for table formatting. I put a little time into the print method for the name definitions.
At worst I was thinking adding it to Suggests. I also think (not 100% sure) that if you explicitly declare the function in the readme (with ::) it can pass devtools check without being in the description.
Oh I see -- yeah, I can do that. There's a print method for that list in the package as well.
e.g. it now does
> hydroloom::hydroloom_name_definitions
1 "id":
shared network identifier for catchment divide and flowpath or flowline
2 "toid":
indicates to the downstream id. May or may not be dendritic
3 "fromnode":
indicates the node representing the nexus upstream of a catchment
4 "tonode":
indicates the node represneting the nexus downstream of a catchment
...
I went ahead and added the add_topo_sort function but realized that in the pfaf function, topo_sort and levelpath have to be in sync -- so I actually don't use it there. I did add that caveat to the documentation.
note that adding .data.table.aware was kind of an out of caution thing so I can use data.table syntax safely. I'll move it to where declare all my imports.
With the way I handle data.frame classes, I have to use st_sf()
to add the sf
class and minor attributes back to a data.frame when they get stripped from time to time. st_as_sf()
is a bit more heavy handed and doesn't do quite what I want.
data.table usage is 100% worth the overhead to convert to/from tibble. Some very large joins (hi res NHD) hang when using dplyr. For most tables, the time to convert to data.table is so small that it has no affect. I could see having a switch for very large tables but don't feel that its necessary at this point.
With regards to hyRefactor / hydrofab -- let's try and move general functionality over to hydroloom over the long run?
Rounding in index_points_to_lines
is on the linear reference. It rounds to the nearest hundredth of a percent. Not sure a control on that is really needed.
In general, I like to keep little utility function used in map/apply patterns as close to where I call them as reasonable. Once a function gets used in more places, I would consider moving to a general utility function space.
I like leaving the alternate implementation as a comment as in make_fromids
. Nice for people who don't know both syntax (like me).
hahah g
is used in some other packages for graph data. I kind of like it but it may also be confusing. I want to try and use g
for the index id format of the network and x
for the traditional hy
form of the network.
navigate_hydro_network was to avoid a function signature collision with nhdplusTools. I kind of need to change it to something else.
tidyr utility functions and st_drop_geometry didn't used to do what I wanted... I've cleaned up those utility functions.
We are on CRAN. https://cran.r-project.org/package=hydroloom
@mikejohnson51 agreed to provide a domain and code review. Will use this issue as a parent to track the review. Thanks for being willing to do it!!
https://www.usgs.gov/products/software/software-management/types-software-review for suggested review.
Note that test files in
tests/testthat
largely mimic file naming inR
and test coverage attempts to fully capture desired behavior of algorithms.