Open cjyetman opened 1 year ago
One option could be to refactor out a portion that works with a vector, and then wrap it in a function that also does the reading:
guess_numerical_mark()
read_and_guess_numberical_mark()
or something like that. Have your cake and eat it too.
Reviewing this, the existing paradigm in this repo is that the functions work on files. That makes me hesitant to change the functionality of guess_numerical_mark()
to accept only a vector. That being said, conceptually this paradigm may not have been ideal, and I can see value in having very specific purpose utility functions that do very specific things on very specific inputs, possibly using a very specific naming scheme, and utilizing those utility functions in other higher level functions that combine functionality of the various utility functions.
For the time being, I will let this brew a while before deciding if this would be worth the effort.
edit: For clarification, I fully grasp that this could be achieved as @jdhoffa has suggested, I'm primarily concerned about the naming of the functions and what that would imply should be done to many of the other functions in this repo in order to retain consistency.
For the record, I am absolutely in favor of careful consideration of the names and purposes of the functions!
The proposed solution was half an idea, but I think it's important to consider these things (especially names lol) as they are often trapdoor decisions that have long-term and potentially unexpected consequences (see the whole r2dii
suite for example)
I'm generally a fan of having a call stack along the lines of
foo <- function(filepaths) {
# some checks on the filepaths
vapply(
x = filepaths,
FUN = file_foo
)
}
file_foo <- function(single_filepath) {
stopifnot(length(single_filepath) == 1) # but with more informative message
contents <- readLines(single_filepath, n = 1L)
val <- string_foo(contents)
return(val)
}
string_foo <- function(string) {
grepl(pattern = "foo", x = string)
}
so in this case, the functions might be named guess_numerical_mark
, file_guess_numerical_mark
, and string_guess_numerical_mark
(suffixing the input type would also work, i.e. guess_numerical_mark_string()
)
Yeah, totally agree (retrospectively). My only concern is how much effort it will be to apply that universally and consistently.
Just some more thoughts on this:
guess_numerical_mark()
currently takes a filepath/s to a CSV and must read in the file and find the appropriate column/variable before achieving its primary function of guessing the numerical marks within a set of character strings that represent numbers. It could instead accept any character vector to process and leave the file read in and column selection to whatever calls it. The potential downside is that the calling function would then need to read in the data in a specific way (as a character vector with no manipulation) and find the column to be processed byguess_numerical_mark()
, which may not be something that needs to happen otherwise.Not 100% convinced this is the right thing to do.
related #7