ProjectMOSAIC / ggformula

Provides a formula interface to 'ggplot2' graphics.
Other
39 stars 11 forks source link

naming in mosaic and ggformula #144

Closed nicholasjhorton closed 3 years ago

nicholasjhorton commented 3 years ago

I made the switch from mosaic::favstats() to ggformula::df_stats() this January term and I'm regretting it.

I know that df_stats() is more flexible but the distinctions between df_ and gf_ are confusing as heck for newbies. The gf_ is mysterious (what's ggformula? how does it relate to the mosaic package?) but they figure it out. It's been interesting to me that how often they confused df and gf.

This may be only my students, or there's some way that you've approached teaching these commands that works better than what I do. I just wanted to offer my reflection.

rpruim commented 3 years ago

I have had very little trouble with this. Not sure exactly why. I do explain that the g is for graphics and the d for data frame, and I introduce ggformula on day one before I've even mentioned mosaic. Perhaps those things help.

As for naming. I've grown weary of choosing names that eventually conflict with other things. So prepending all the graphics functions in ggformula with something seemed useful, and using something generic like plot_ seemed problematic. gf_ is short and related to the name and design of the package. It also seems safer than gg_ in terms of avoiding collisions.

We could come up with another name for df_stats() and introduce it as an alias. But I don't want it to be gf_stats(). Do you have a name to suggest?

rpruim commented 3 years ago

Another plus for gf_ is that `apropos('gf_') will create a useful list of available functions.

nicholasjhorton commented 3 years ago

My sense from this semester is that I want to talk about introducing the "formula" approach to R that is implemented in the "mosaic" and "ggformula" packages. Conveniently, these are both loaded when you run library(mosaic).

I suspect that I will go back to use of favstats() instead of df_stats().

Closing for now.

rpruim commented 3 years ago

Two things you will lose if you revert:

  1. df_stats() produces a data frame, which makes it easy to add summary values to a plot (but perhaps you don't do that much in intro).
  2. df_stats() can process multiple response variables at once.

If the name is the only things that is causing you issues, we should perhaps introduce an alias.