function names - Githubissues

AliciaSchep commented 6 years ago

This package is a great idea! One thought -- I found the function names a bit confusing and unwieldy. Here are some ideas:

bare_input bare_input_lhs bare_input_list string_input string_input_list

It seems like typed is intending to mean a user typed it in... I think there is a clash with the "type" concept of having objects with a specified type. I'm not sure if "bare" is super clear either, but at least I don't think it has as much of a clash with another programming concept.

For value, my sense is that this means string in which case I think string is more clear but maybe I am not thinking about other possibilities...

The as_name bit I think is a bit confusing because I think it leads to the question of "name of what?" even though it seems to be technically appropriate. It makes the function names super long too...

These suggestions surely have flaws too, but wanted to get a convo going with some ideas :smile:

malcolmbarrett commented 6 years ago

Having just looked at this package, I had the same thoughts, and I like the names @AliciaSchep suggests a lot.

MilesMcBain commented 6 years ago

Hey Alicia, thanks for starting this issue!

I like your suggestions. Particularly the use of 'input'. :+1:

I've thought about 'bare' but I had trouble writing documentation I didn't find convoluted that explained the term, but I think it would be good to revisit that now. I'd be interested in seeing a version of the points under:

There are two ways to make double_col work:

that reads clearly with 'bare'.

I do feel bad about the long function names, but my thoughts on those were:

the long name helps you find the right one using auto-complete
you can immediately convert to standard rlang if it bothers you

But it sounds like the feedback is: the bad feelings of a long function name aren't mitigated by these factors. In which case we do need to shorten them.

AliciaSchep commented 6 years ago

Hi Miles! BTW came across this repo b/c the Runconference tweets made me wonder what cool stuff you were up to :smile:

Re bare, I can see that it is a bit awkward of a term to document -- maybe expression_input then? Would that be easier to document -- basically bold expression instead of typed in the section you called out? That also matches more closely the language used in rlang documentation

On the function length, I don't think it is the length per se that is the issue (apologies for my lack of clarity above), but more the complexity and difficulty of parsing it in your head. typed_list_as_name_list is hard to even say in your head when you are thinking about applying the function. I personally find that function names that are somewhat natural to say (even if you're never saying aloud, but just thinking to yourself) are easier to work with (so very short names that don't map to real words can also be tough, which I think is something that makes working with rlang difficult...)

pkq commented 6 years ago

I agree with @AliciaSchep, both in that this is a really interesting idea, and in that the primary challenge with the current function names is their cognitive demand rather than their length.

I also appreciate your interest in leveraging autocomplete, but I wonder if perhaps you haven't gone far enough? One of the things I really like about stringr is that all of the functions have a common prefix, which means that when trying to remember a string transformation, all I really have to remember is the prefix. What about a common prefix here as well, something like eval_, paired with input for the bare versions, and string for the value versions? That would give you:

eval_input_as_lhs() eval_input_as_list() eval_input_as_name() eval_string_as_list() eval_string_as_name()

Then, the workflow could be something like, "I want to incorporate this dplyr syntax into a function, I'll just load up friendlyeval, type eval_ and select the function that does what I want."

MilesMcBain commented 6 years ago

Wow. This is great! I love this prefix idea. eval is a great choice as it meshes nicely with the documentation for the operators. Thanks so much for the suggestion.

pkq commented 6 years ago

Glad you like it! Happy to look over documentation etc. too, if you think it'd be helpful.

MilesMcBain commented 6 years ago

Yes help is welcome there. First of all thought I think the key section we need to nail using the new terminology is in README from

There are two ways to make double_col work:

to

Usage examples

So I'm going to create a branch called eval to try out some ideas there.

I have two modifications I am considering to the eval list:

I think the list() versions might benefit from a sense of plurality in the thing being evaluated. So it could be as simple as adding an s: eval_inputs_as_list()
At the moment I prefer value to string since I feel the dichotomy you could form between the input and the value of that input might be stickier in people's brains. That's just my intuition though, we'll see how it works written up.

MilesMcBain commented 6 years ago

Okay so that was a lot easier than I expected! I'm think the results are a marked improvement. Still a couple of tiny things I feel weird about. Check out the eval branch README.

AliciaSchep commented 6 years ago

One other thought about as_name... it doesn't capture everything that the function can be used for in the case of eval_input_as_name. With the double_col function for example, you can pass in an expression like cyl + 5 or cyl + mpg -- an expression that is more than just a column name. Being able to passing those kinds of expressions is one of the key benefits of all this tidyeval framework. What about just having eval_input instead of eval_input_as_name?

pkq commented 6 years ago

Agreed, these changes definitely make it easier to parse. Pluralizing the list versions makes sense. I think my stumbling block with eval_value is the double val, which make it harder to say. That said, the rationale for value in contrast to input makes sense, so it is probably the better choice.

That's a good point, @AliciaSchep: as_name implies more limited functionality than is available, which probably isn't ideal for a package that's trying to be user friendly. One other benefit is that dropping as_name emphasizes that the other functions are specific cases of the two basic use cases: evaluating input and evaluative values. I really like:

eval_input() eval_input_as_lhs() eval_inputs_as_list() eval_value() eval_values_as_list()

pkq commented 6 years ago

Looking at the README again, you describe all three of the eval_input functions via the phrase "Use the expression". Thus, another option to consider could be eval_expr (short for expression) and friends. I don't think I like that as much as eval_input but thought I'd mention it, in the interest of being thorough.

MilesMcBain commented 6 years ago

Re: name and passing in expressions, I did consider this, and I ended up convincing myself that if you're writing dplyr code that passes around unevaluated expressions, you're at a level where you can deal with the tidyeval suite and you're better off with the full set of tools.

I'd be glad for counter examples, but I tried to come up with dplyr cases where a user would be naively handing down expressions and I couldn't come up with anything that seemed unconvoluted or 'common' enough.

ggplot2 usecases may change that, but if so, would it serve users better to have a couple of ggplot2 themed functions and explanations?

MilesMcBain commented 6 years ago

I was just thinking it might be nice to have a 'Beyond Friendlyeval' vignette that has some rlang examples for the expressions and other slightly more advanced usage.

AliciaSchep commented 6 years ago

I guess I don't quite understand how this is "Beyond friendlyeval" as the current first three functions enable passing expressions; it is just a question of the name of the function being reflective of what is being done. I would guess that for many functions that use mutate or filter internally the intent would be to allow expressions as well as just column names.... to have the custom function argument be something that is passed directly to the dplyr function (which the user knows accepts more than just column names).

Here is the first example that popped to my head, a modified filter that tells you how many rows you filtered out:

filter_loudly <- function(x, ...){
  in_rows <- nrow(x)
  out <- filter(x, !!!typed_list_as_name_list(...))
  out_rows <- nrow(out)
  message("Filtered out ",in_rows-out_rows," rows.")
  return(out)
}

You would almost always be using an expression, e.g. mpg > 30 as input rather than just a name.

Some of the patterns where I can image where you want to pass expressions:

You want to do add some side-effect to dplyr code (e.g. example above) and you would generally be passing an expression to the dplyr code
You want to do additional transformation of input automatically (e.g. the double_col example, which could just as easily be thought of as double_input in which case it could make sense to pass some expression if what you wanted to transform was some combination)
Custom summarization as in the programming with dplyr vignette. While you can always use mutate to assign whatever transformation to a column first, it is nice to be able to avoid that and pass the expression directly. One of the examples in the vignette does that.

AliciaSchep commented 6 years ago

One note -- with the help of autocomplete it was pretty easy to get to typed_list_as_name_list :smile: Even if I think it is unwieldy, was still easier to reach than whatever the actual rlang command is! Any of the proposed names will make for a helpful package...

MilesMcBain commented 6 years ago

You're right, the functions allow it, but I was thinking that explaining things in terms of expressions might get too complicated. Especially since you can't pass expressions as values/strings for the last two.

However, maybe I'm being too pessimistic about that.

Sorry, I don't understand what you mean by the third bullet. For bullets 1 & 2, you're describing quite abstract/general functions - to even come up with those I think you need to have some sense of metaprogramming. I guess I had imagined most people who needed friendlyeval wouldn't be operating at that level.

But if the explanation can be kept clear and direct and the examples concise there's no problem.

MilesMcBain commented 6 years ago

Forgive me if this is a slightly wild idea, but it's Friday afternoon here. What if the list became something like:

eval_input_as_name
eval_input_as_lhs
eval_input_as_expr
eval_inputs_as_names
eval_inputs_as_exprs
eval_value_as_name
eval_value_as_expr
eval_values_as_names
eval_values_as_exprs

So under the hood, some of these functions map to the exact same rlang function, but this split allows for clarity of explanation? (edit: and ease of use when auto-completing)

edit2: in which case maybe name could become something more specific like col.

edit3: and maybe makes your intent clearer to yourself while you're muddling through?

pkq commented 6 years ago

My initial reaction was adding more functions with the same functionality would make things unwieldy, but after thinking about it for a bit (and considering the advantages of auto-completion), this latest proposal is growing on me. It seems that the increased specificity would benefit the target audience. If you do go that route, I agree that eval_input_as_col is clearer that eval_input_as_name (and, as the likely first use case for tidyeval for many folks, has the added advantage of being alphabetically first).

AliciaSchep commented 6 years ago

I like that idea for the input functions. For the value functions I don't think the same function works for name/col as expr... but also don't think the 'expr' versions are needed for 'value' (a much rarer scenario I would think).

MilesMcBain commented 6 years ago

Yeah so for input name/expr would call the same function, but for value they'll call different functions. I agree value is rarer, but I do encounter code that builds expressions using paste0 regularly if infrequently. As it happens I was presented with some this week, so I may be affected by recency bias.

MilesMcBain commented 6 years ago

I'm feeling better about this choice since I have realised I overlooked the ensym/ensyms functions. This means we do not need eval_input_as_col_lhs and now every function is a 1 to 1 mapping to rlang again, since these ensym functions are for the 'name only' case. :sweat_smile: - If anything this just goes to show how needed this package is!

lionel- commented 6 years ago

I think the eval_ prefix gives a wrong idea of what's going because these functions don't evaluate anything (it is !! that evaluates early on user's command and then dplyr / tidyr evaluates later on in the data mask).

I also suggest not using name as a particle. We are trying to give name a single meaning, which is of a string. This is consistent with names() and colnames() which always return character vectors. In contrast bare names or symbols are not quoted in strings. Though I admit it is kind of hard to keep this distinction when discussing R code...

Furthermore it seems you're aliasing enquo() / enquos() to this "name" particle. This is wrong for another reason because the purpose of enquo is purely to capture complex expressions that involve user functions. This is a point that I think is not well made in the current documentation. If you're only unquoting bare column names, it is perfectly safe to use ensym() / ensyms() (which capture the user code and always return symbols or "bare names" - if the user typed an expression instead this gives an error). Quosures are about keeping track of foreign contexts where user functions are defined, and data frame columns are never foreign by definition.

Finally I don't think "value" is a proper term when you're expecting a string. A value is the result of some evaluation and can be any kind of R object.

MilesMcBain commented 6 years ago

@lionel- thankyou for your comments. It looks like I realised my ensym goof just before you posted but I appreciate you helping out!

Since I seem to be in the minority now on value vs string I might have to change my position on that. As I said above I liked the dichotomy between input and value for the explantation but in terms being specific and helpful with autocomplete string might be the winner.

MilesMcBain commented 6 years ago

I do think the eval_ prefix is useful though. It helps with auto-complete and I think even if the mental model is not correct, it is useful. I guess the only reason I would want to change it is if you know of some case non-power users are likely to encounter where this misconception would be catastrophic.

lionel- commented 6 years ago

If it is just for auto-complete maybe you can use useful_ as prefix then ;)

eval_ functions should evaluate their arguments. I feel strongly about this.

MilesMcBain commented 6 years ago

In the documentation I've framed these functions as instructions to dplyr on how to evaluate arguments. So the instruction implied by eval_input_as_col is "Evaluate what was input for this argument as a column name" - It's a close mapping. I appreciate there may be conventions being bucked here, but the whole point is this package is for people without a deep background in computer science.

lionel- commented 6 years ago

Then maybe you should not use eval_ which is purely a programming concept. I don't have a background in computer science btw.

How about prep_? Prepare input for dplyr.

lionel- commented 6 years ago

I don't have a background in computer science btw.

To make this point clearer, I don't like the idea that programming is for people with computer science backgrounds. Loops, functions, higher-order functions like lapply, meta-programming, ... These are all things that scientists and analysts of any backgrounds can learn and be productive with. I also think they should be learned in this order which is why I appreciate your effort to make meta-programming accessible to people who didn't have time to get there (or never will because they don't have the interest).

MilesMcBain commented 6 years ago

I agree with you on that very much! I also have great appreciation for the fact you have laid it out as a progression. I feel this is the way most people learn, but often when we teach, we don't respect that, and fail to layout a progression people can follow to higher proficiency.

My sense of 'evaluate' is that most R programmers would be comfortable with this term. It comes up pretty early in programming life particularly around conditional statements. 'If this evaluates to true' etc.

lionel- commented 6 years ago

Still you know my position: eval() is a very important function in R (and other languages) and eval_ functions should be variants of eval().

prep_ would be better on all counts (better reflects semantics and is a non-intimidating non-technical term).

AliciaSchep commented 6 years ago

I agree that eval is misleading, and really like the prep suggestion! Even if folks are not familiar with the concept of eval, if this package is a gateway into rlang, they may soon be learning it... In that case, the names here could lead to some confusion and make it harder to get to that understanding.

MilesMcBain commented 6 years ago

One reason I prefer eval to prep is that it fits within that frame of 'you're issuing an instruction to dplyr to evaluate the input/value as a col/expression'.

So in my mind, the command isn't supposed to read 'evaluate this now', rather: 'here are instructions for evaluation'. I concede this distinction could easily be lost.

For me, prep doesn't fit within that frame, we're saying: 'use this function to prepare the input/value as a col/expression' - it heads in the direction of mapping to the underlying implementation, rather than just expressing what the user wants to do directly.

This is something I do feel strongly about: I'd like to keep the friendlyeval functions and documentation focused on letting the user express what they want to happen in the context of common cases, rather than how it should happen.

With that in mind, there are probably terms that would work within that constraint other than eval:

use is an obvious one but that prefix is already used a lot by usethis.
using maybe?
take as in 'take the input supplied for an argument and use it as a column name' - I wonder if non-native English speakers would get that one?
with - It's interesting since it isn't a verb, but does still sound like an instruction.

Any other ideas?

pkq commented 6 years ago

How about treat_, as in, "hey dplyr, treat this inputs as a column name"?

pkq commented 6 years ago

Bonus side effect: catchy tag lines like "Using friendlyeval is a treat!"

MilesMcBain commented 6 years ago

👌👌👌 I like this a lot!

pkq commented 6 years ago

Also, imagine the hexsticker potential...

njtierney commented 6 years ago

How about 'pass'

MilesMcBain commented 6 years ago

Thanks Nick, as we discussed on the phone, I am quite taken with treat!

I've created treat branch with an updated README that also swaps value for string as was suggested by Alicia and Lionel. I have to say I think we are pretty close here: https://github.com/MilesMcBain/friendlyeval/blob/treat/README.md

I really like how it looks and how it makes the explanations read. :+1:

pkq commented 6 years ago

Agreed! I think the latest version of the README is very clear, both in motivation and execution.

AliciaSchep commented 6 years ago

Seconded! :tada:

MilesMcBain commented 6 years ago

Good news! I've just merged the treat branch over to master.

Biggest thanks go to @pkq for the rewrite of the README!

I've added names of contributors from this thread to the DESCRIPTION. I haven't added emails, so make a PR if you would like me to add. Also let me know if you'd like to be removed.

MilesMcBain / friendlyeval

function names #1

Usage examples