Open AliciaSchep opened 6 years ago
Having just looked at this package, I had the same thoughts, and I like the names @AliciaSchep suggests a lot.
Hey Alicia, thanks for starting this issue!
I like your suggestions. Particularly the use of 'input'. :+1:
I've thought about 'bare' but I had trouble writing documentation I didn't find convoluted that explained the term, but I think it would be good to revisit that now. I'd be interested in seeing a version of the points under:
There are two ways to make double_col work:
that reads clearly with 'bare'.
I do feel bad about the long function names, but my thoughts on those were:
rlang
if it bothers youBut it sounds like the feedback is: the bad feelings of a long function name aren't mitigated by these factors. In which case we do need to shorten them.
Hi Miles! BTW came across this repo b/c the Runconference tweets made me wonder what cool stuff you were up to :smile:
Re bare, I can see that it is a bit awkward of a term to document -- maybe expression_input
then? Would that be easier to document -- basically bold expression instead of typed in the section you called out? That also matches more closely the language used in rlang documentation
On the function length, I don't think it is the length per se that is the issue (apologies for my lack of clarity above), but more the complexity and difficulty of parsing it in your head. typed_list_as_name_list
is hard to even say in your head when you are thinking about applying the function. I personally find that function names that are somewhat natural to say (even if you're never saying aloud, but just thinking to yourself) are easier to work with (so very short names that don't map to real words can also be tough, which I think is something that makes working with rlang difficult...)
I agree with @AliciaSchep, both in that this is a really interesting idea, and in that the primary challenge with the current function names is their cognitive demand rather than their length.
I also appreciate your interest in leveraging autocomplete, but I wonder if perhaps you haven't gone far enough? One of the things I really like about stringr
is that all of the functions have a common prefix, which means that when trying to remember a string transformation, all I really have to remember is the prefix. What about a common prefix here as well, something like eval_
, paired with input
for the bare
versions, and string
for the value versions? That would give you:
eval_input_as_lhs()
eval_input_as_list()
eval_input_as_name()
eval_string_as_list()
eval_string_as_name()
Then, the workflow could be something like, "I want to incorporate this dplyr
syntax into a function, I'll just load up friendlyeval
, type eval_
and select the function that does what I want."
Wow. This is great! I love this prefix idea. eval
is a great choice as it meshes nicely with the documentation for the operators. Thanks so much for the suggestion.
Glad you like it! Happy to look over documentation etc. too, if you think it'd be helpful.
Yes help is welcome there. First of all thought I think the key section we need to nail using the new terminology is in README from
There are two ways to make double_col work:
to
Usage examples
So I'm going to create a branch called eval
to try out some ideas there.
I have two modifications I am considering to the eval
list:
I think the list()
versions might benefit from a sense of plurality in the thing being evaluated. So it could be as simple as adding an s
: eval_inputs_as_list()
At the moment I prefer value
to string
since I feel the dichotomy you could form between the input
and the value
of that input might be stickier in people's brains. That's just my intuition though, we'll see how it works written up.
Okay so that was a lot easier than I expected! I'm think the results are a marked improvement. Still a couple of tiny things I feel weird about. Check out the eval
branch README.
One other thought about as_name
... it doesn't capture everything that the function can be used for in the case of eval_input_as_name
. With the double_col
function for example, you can pass in an expression like cyl + 5
or cyl + mpg
-- an expression that is more than just a column name. Being able to passing those kinds of expressions is one of the key benefits of all this tidyeval framework. What about just having eval_input
instead of eval_input_as_name
?
Agreed, these changes definitely make it easier to parse. Pluralizing the list
versions makes sense. I think my stumbling block with eval_value
is the double val
, which make it harder to say. That said, the rationale for value
in contrast to input
makes sense, so it is probably the better choice.
That's a good point, @AliciaSchep: as_name
implies more limited functionality than is available, which probably isn't ideal for a package that's trying to be user friendly. One other benefit is that dropping as_name
emphasizes that the other functions are specific cases of the two basic use cases: evaluating input and evaluative values. I really like:
eval_input()
eval_input_as_lhs()
eval_inputs_as_list()
eval_value()
eval_values_as_list()
Looking at the README again, you describe all three of the eval_input
functions via the phrase "Use the expression". Thus, another option to consider could be eval_expr
(short for expression) and friends. I don't think I like that as much as eval_input
but thought I'd mention it, in the interest of being thorough.
Re: name
and passing in expressions, I did consider this, and I ended up convincing myself that if you're writing dplyr
code that passes around unevaluated expressions, you're at a level where you can deal with the tidyeval
suite and you're better off with the full set of tools.
I'd be glad for counter examples, but I tried to come up with dplyr
cases where a user would be naively handing down expressions and I couldn't come up with anything that seemed unconvoluted or 'common' enough.
ggplot2
usecases may change that, but if so, would it serve users better to have a couple of ggplot2
themed functions and explanations?
I was just thinking it might be nice to have a 'Beyond Friendlyeval' vignette that has some rlang
examples for the expressions and other slightly more advanced usage.
I guess I don't quite understand how this is "Beyond friendlyeval" as the current first three functions enable passing expressions; it is just a question of the name of the function being reflective of what is being done. I would guess that for many functions that use mutate
or filter
internally the intent would be to allow expressions as well as just column names.... to have the custom function argument be something that is passed directly to the dplyr function (which the user knows accepts more than just column names).
Here is the first example that popped to my head, a modified filter
that tells you how many rows you filtered out:
filter_loudly <- function(x, ...){
in_rows <- nrow(x)
out <- filter(x, !!!typed_list_as_name_list(...))
out_rows <- nrow(out)
message("Filtered out ",in_rows-out_rows," rows.")
return(out)
}
You would almost always be using an expression, e.g. mpg > 30
as input rather than just a name.
Some of the patterns where I can image where you want to pass expressions:
One note -- with the help of autocomplete it was pretty easy to get to typed_list_as_name_list
:smile: Even if I think it is unwieldy, was still easier to reach than whatever the actual rlang command is! Any of the proposed names will make for a helpful package...
You're right, the functions allow it, but I was thinking that explaining things in terms of expressions might get too complicated. Especially since you can't pass expressions as values/strings for the last two.
However, maybe I'm being too pessimistic about that.
Sorry, I don't understand what you mean by the third bullet. For bullets 1 & 2, you're describing quite abstract/general functions - to even come up with those I think you need to have some sense of metaprogramming. I guess I had imagined most people who needed friendlyeval
wouldn't be operating at that level.
But if the explanation can be kept clear and direct and the examples concise there's no problem.
Forgive me if this is a slightly wild idea, but it's Friday afternoon here. What if the list became something like:
eval_input_as_name
eval_input_as_lhs
eval_input_as_expr
eval_inputs_as_names
eval_inputs_as_exprs
eval_value_as_name
eval_value_as_expr
eval_values_as_names
eval_values_as_exprs
So under the hood, some of these functions map to the exact same rlang
function, but this split allows for clarity of explanation? (edit: and ease of use when auto-completing)
edit2: in which case maybe name
could become something more specific like col
.
edit3: and maybe makes your intent clearer to yourself while you're muddling through?
My initial reaction was adding more functions with the same functionality would make things unwieldy, but after thinking about it for a bit (and considering the advantages of auto-completion), this latest proposal is growing on me. It seems that the increased specificity would benefit the target audience. If you do go that route, I agree that eval_input_as_col
is clearer that eval_input_as_name
(and, as the likely first use case for tidyeval
for many folks, has the added advantage of being alphabetically first).
I like that idea for the input
functions. For the value
functions I don't think the same function works for name
/col
as expr
... but also don't think the 'expr' versions are needed for 'value' (a much rarer scenario I would think).
Yeah so for input
name
/expr
would call the same function, but for value
they'll call different functions. I agree value is rarer, but I do encounter code that builds expressions using paste0
regularly if infrequently. As it happens I was presented with some this week, so I may be affected by recency bias.
I'm feeling better about this choice since I have realised I overlooked the ensym
/ensyms
functions. This means we do not need eval_input_as_col_lhs
and now every function is a 1 to 1 mapping to rlang
again, since these ensym
functions are for the 'name only' case. :sweat_smile: - If anything this just goes to show how needed this package is!
I think the eval_
prefix gives a wrong idea of what's going because these functions don't evaluate anything (it is !!
that evaluates early on user's command and then dplyr / tidyr evaluates later on in the data mask).
I also suggest not using name
as a particle. We are trying to give name a single meaning, which is of a string. This is consistent with names()
and colnames()
which always return character vectors. In contrast bare names or symbols are not quoted in strings. Though I admit it is kind of hard to keep this distinction when discussing R code...
Furthermore it seems you're aliasing enquo()
/ enquos()
to this "name" particle. This is wrong for another reason because the purpose of enquo is purely to capture complex expressions that involve user functions. This is a point that I think is not well made in the current documentation. If you're only unquoting bare column names, it is perfectly safe to use ensym()
/ ensyms()
(which capture the user code and always return symbols or "bare names" - if the user typed an expression instead this gives an error). Quosures are about keeping track of foreign contexts where user functions are defined, and data frame columns are never foreign by definition.
Finally I don't think "value" is a proper term when you're expecting a string. A value is the result of some evaluation and can be any kind of R object.
@lionel- thankyou for your comments. It looks like I realised my ensym
goof just before you posted but I appreciate you helping out!
Since I seem to be in the minority now on value
vs string
I might have to change my position on that. As I said above I liked the dichotomy between input
and value
for the explantation but in terms being specific and helpful with autocomplete string
might be the winner.
I do think the eval_
prefix is useful though. It helps with auto-complete and I think even if the mental model is not correct, it is useful. I guess the only reason I would want to change it is if you know of some case non-power users are likely to encounter where this misconception would be catastrophic.
If it is just for auto-complete maybe you can use useful_
as prefix then ;)
eval_
functions should evaluate their arguments. I feel strongly about this.
In the documentation I've framed these functions as instructions to dplyr
on how to evaluate arguments. So the instruction implied by eval_input_as_col
is "Evaluate what was input for this argument as a column name" - It's a close mapping. I appreciate there may be conventions being bucked here, but the whole point is this package is for people without a deep background in computer science.
Then maybe you should not use eval_
which is purely a programming concept. I don't have a background in computer science btw.
How about prep_
? Prepare input for dplyr.
I don't have a background in computer science btw.
To make this point clearer, I don't like the idea that programming is for people with computer science backgrounds. Loops, functions, higher-order functions like lapply, meta-programming, ... These are all things that scientists and analysts of any backgrounds can learn and be productive with. I also think they should be learned in this order which is why I appreciate your effort to make meta-programming accessible to people who didn't have time to get there (or never will because they don't have the interest).
I agree with you on that very much! I also have great appreciation for the fact you have laid it out as a progression. I feel this is the way most people learn, but often when we teach, we don't respect that, and fail to layout a progression people can follow to higher proficiency.
My sense of 'evaluate' is that most R programmers would be comfortable with this term. It comes up pretty early in programming life particularly around conditional statements. 'If this evaluates to true' etc.
Still you know my position: eval()
is a very important function in R (and other languages) and eval_
functions should be variants of eval()
.
prep_
would be better on all counts (better reflects semantics and is a non-intimidating non-technical term).
I agree that eval is misleading, and really like the prep suggestion! Even if folks are not familiar with the concept of eval, if this package is a gateway into rlang, they may soon be learning it... In that case, the names here could lead to some confusion and make it harder to get to that understanding.
One reason I prefer eval
to prep
is that it fits within that frame of 'you're issuing an instruction to dplyr
to evaluate the input/value as a col/expression'.
So in my mind, the command isn't supposed to read 'evaluate this now', rather: 'here are instructions for evaluation'. I concede this distinction could easily be lost.
For me, prep
doesn't fit within that frame, we're saying: 'use this function to prepare the input/value as a col/expression' - it heads in the direction of mapping to the underlying implementation, rather than just expressing what the user wants to do directly.
This is something I do feel strongly about: I'd like to keep the friendlyeval
functions and documentation focused on letting the user express what they want to happen in the context of common cases, rather than how it should happen.
With that in mind, there are probably terms that would work within that constraint other than eval
:
use
is an obvious one but that prefix is already used a lot by usethis
. using
maybe?take
as in 'take the input supplied for an argument and use it as a column name' - I wonder if non-native English speakers would get that one?with
- It's interesting since it isn't a verb, but does still sound like an instruction.Any other ideas?
How about treat_
, as in, "hey dplyr
, treat this inputs as a column name"?
Bonus side effect: catchy tag lines like "Using friendlyeval
is a treat!"
👌👌👌 I like this a lot!
Also, imagine the hexsticker potential...
How about 'pass'
Thanks Nick, as we discussed on the phone, I am quite taken with treat!
I've created treat
branch with an updated README that also swaps value
for string
as was suggested by Alicia and Lionel. I have to say I think we are pretty close here: https://github.com/MilesMcBain/friendlyeval/blob/treat/README.md
I really like how it looks and how it makes the explanations read. :+1:
Agreed! I think the latest version of the README is very clear, both in motivation and execution.
Seconded! :tada:
Good news! I've just merged the treat
branch over to master.
Biggest thanks go to @pkq for the rewrite of the README!
I've added names of contributors from this thread to the DESCRIPTION. I haven't added emails, so make a PR if you would like me to add. Also let me know if you'd like to be removed.
This package is a great idea! One thought -- I found the function names a bit confusing and unwieldy. Here are some ideas:
bare_input
bare_input_lhs
bare_input_list
string_input
string_input_list
It seems like
typed
is intending to mean a user typed it in... I think there is a clash with the "type" concept of having objects with a specified type. I'm not sure if "bare" is super clear either, but at least I don't think it has as much of a clash with another programming concept.For
value
, my sense is that this meansstring
in which case I thinkstring
is more clear but maybe I am not thinking about other possibilities...The
as_name
bit I think is a bit confusing because I think it leads to the question of "name of what?" even though it seems to be technically appropriate. It makes the function names super long too...These suggestions surely have flaws too, but wanted to get a convo going with some ideas :smile: